PRED_DD (1158361), страница 2
Текст из файла (страница 2)
ACopyRegim – the mode of copying.
The function determines time spent on communications during loading buffers by the remote array elements.
The function returns required time.
2.2.8.Renewing shadow edges of distributed array
BoundGroup::BoundGroup( );
The function creates empty shadow edge group (that is group that does not contain any shadow edge).
void BoundGroup::AddBound( DArray *ADArray, long *ALeftBSizeArray, long *ARightBSizeArray, long ACornerSign );
ADArray – pointer to the distributed array.
ALeftBSizeArray – ALeftBSizeArray[i] is the width of the low shadow edge of the (i+1)th dimension of the array.
ARightBSizeArray – ARightBSizeArray[ 00000000000000000000000000000000000000000000] is the width of the high shadow edge of the (i+1)th dimension of the array.
ACornerSign – the flag of the inclusion of the «corner» elements to the shadow edge.
The function includes distributed array shadow edge in the group.
double BoundGroup::StartB( );
The function determines time spent on renewing shadow edges of distributed array included in the group.
The function returns required time.
2.2.9.Reduction
RedVar::RedVar( long ARedElmSize, long ARedArrLength, long ALocElmSize );
ARedElmSize – the size (in bytes) of an element of the reduction array-variable.
ARedArrLength – the number of the elements in the array-variable.
ALocElmSize – the size (in bytes) of an element of the array with additional information.
The function creates reduction.
RedGroup::RedGroup ( VM *AVMPtr );
AVMPtr – pointer to the processor system.
The function creates empty reduction group (that is group that does not contain any reduction).
____________________________________________
void RedGroup::AddRV( RedVar *ARedVar );
ARedVar – pointer to the reduction.
The function includes reduction in reduction group.
double RedGroup::StartR( ParLoop *AParLoop );
AParLoop – pointer to the parallel loop, in which the values of the reduction variables of the group are calculated.
The function determines time spent on communications during execution reduction operations over all reduction variables of the group.
The function returns required time.
3.Processing trace information
3.1.Representation of the program as hierarchy of intervals
The execution of the program can be represented as a sequence of intervals. By default, the program is considered as one interval. Also user can define intervals by means of C-DVM and Fortran DVM languages (Constraint: when predictor is used, the integer expression with an interval must not appear inside parallel loop).
There is also opportunity to set a mode of compilation, when all parallel loops, or all sequential loops, containing parallel loops or all sequential loops are declared as intervals.
The user can also split any interval into smaller intervals or unite neighbor intervals (in order of execution) in a new one, i.e. to present the program as hierarchy of intervals of several levels (the whole program is an interval of highest level).
The mechanism of splitting the program into intervals serves for more detail analysis of behavior of the program during its execution. Looking through results with the help of the predictor, user can set a depth of details to leave out of consideration intervals of prescribed levels.
For simplification of the further description we shall enter the following notions. An interval will be named simple, if it does not contain other intervals (nested intervals). We refer to intervals including nested intervals as composite. While processing trace information some intervals are active intervals (entered but not exited). The active interval of the lowest level, we shall name a current interval.
The trace information saved in file during the DVM-program execution is processed then on a workstation as follows.
3.2.Processing trace information overview
3.2.1.Accumulation of the information about intervals
During processing the trace, the information on each interval is gathered (type of the interval, the number of the interval, the number of the level, number of the interval entries, source file name and a line number corresponding to the beginning of the interval). Also for current intervals the following times describing its execution on a serial computer are calculated (the processor performance is supposed to be equal to processors performance of a target system):
-
Productive processor time of sequential calculations (Productive_CPU_time_Seq) (It is determined on sequential region of the interval.)
-
Productive processor time of parallel calculations (Productive_CPU_time_Par) (It is determined in parallel loops.)
-
Productive processor time (Productive_CPU_time) (It is calculated as the sum of the productive processor time of sequential and parallel calculations.)
-
Time of input/output (I/O_time)
When these times are collected the excessive overheads needed for output of the trace information into the file during program execution on the workstation also is taken into account.
3.2.2.Simulation of parallel execution of DVM-program
The simulation of basic LIB-DVM functions is performed, using the trace information accumulated in the file and parameters of the multiprocessor system given by the user. It allows to determine times necessary for execution of interprocessor communication, and also information about distribution of calculations between processors. Thus, the following times describing parallel execution of intervals are calculated (these characteristics are generalized for all processors):
-
Processor time of parallel calculations (CPU_time_Par)
-
Synchronization (Synchronization) for all types of collective operations (Start_reduction, Wait_reduction, Start_shadow, Wait_shadow, Remote_access, Redistribution and I/O)
-
Time variation (Time_variation) of collective operation completion
-
Communication times for all types of collective operations (Start_reduction, Wait_reduction, Start_shadow, Wait_shadow, Remote_access, Redistribution and I/O)
-
Total communication time (Communication) (Is the sum of communication times of all collective operations.)
-
Time of losses because of insufficient parallelism during sequential calculations (Insufficient_parallelism_Seq)
-
Time of losses because of insufficient parallelism during parallel calculations (Insufficient_parallelism_Par)
-
Total time of losses because of insufficient parallelism (Insufficient_parallelism) (For any interval equal to the sum of times of losses because of insufficient parallelism during sequential and parallel calculations.)
-
Time of reduction overlapping (Reduction_overlap)
-
Time of shadow edges renewing overlap (Shadow_overlap)
-
Time of overlapping of communications by calculations (Overlap) (Is calculated as the sum of two base characteristics – time of reduction overlapping and time of shadow edges renewing overlap.)
-
Idle (Idle)
At the exit from composite intervals the correction of all characteristics described in sections 3.2.1. and 3.2.2. (except the characteristic Idle) is performed. Also each characteristic of any interval of the i-th level is calculated by means of adding to it the same characteristics of all nested intervals of the (i+1)-th level.
After correction and also at the exit from simple intervals the characteristic imbalance (Load_Imbalance) is calculated.
3.2.3.Calculation of the main performance characteristics
These characteristics concern all parallel program and its intervals. According to the degree of details, given by the user (level of intervals), the following information on predicted performance execution of the program can be given:
-
Time of input/output (I/O_time).
-
Imbalance (Load_Imbalance).
-
Synchronization (Synchronization) and all its components.
-
Time variation (Time_variation),
-
Losses because of insufficient parallelism (Insufficient_parallelism) with all its components.
-
Communication (Communication) and its components for all types of collective operations.
-
Time of overlapping (Overlap) with components.
-
Productive processor time (Productive_CPU_time).
-
Productive time (Productive_time) is sum of two components: productive processor time (Productive_CPU_time) and time of input/output (I/O_time).
-
Lost time (Lost_time) are calculated as the sum of its components: insufficient parallelism (Insufficient_parallelism), communication (Communication) and idle (Idle).
-
Total processor time (Total_time) is sum of productive time (Productive_time) and lost time (Lost_time).
-
Time of execution (Execution_time) is a ratio of total processor time (Total_time) to number of processors.
-
Efficiency coefficient (Efficiency) is a ratio of productive time (Productive_time) to total processor time (Total_time).















