PRED_DD (1158361)
Текст из файла
Predictor of DVM-program performance
Detailed design
March 31, 1999
Keldysh Institute of Applied Mathematics
Russia Academy of Sciences
Contents
1. Overview 3
1.1. Functions of Predictor 3
1.2. The Content of Predictor 4
2. The Lib-DVM simulator 4
2.1. The LIB-DVM simulator overview 4
2.2. The base functions of the LIB-DVM simulator module 4
2.2.1. Processor system defining 4
2.2.2. Template creating 4
2.2.3. Mapping template 5
2.2.4. Distributed array creating 5
2.2.5. Mapping distributed array 5
2.2.6. Parallel loop defining 6
2.2.7. Loading buffers by the remote array elements 7
2.2.8. Renewing shadow edges of distributed array 7
2.2.9. Reduction 8
3. Processing trace information 9
3.1. Representation of the program as hierarchy of intervals 9
3.2. Processing trace information overview 9
3.2.1. Accumulation of the information about intervals 9
3.2.2. Simulation of parallel execution of DVM-program 10
3.2.3. Calculation of the main performance characteristics 10
1.Overview
1.1.Functions of Predictor
The predictor is intended for performance analysis and debugging of DVM-programs without usage of a real parallel computer (access to which is usually limited or complicated). With the predictor user can get the predicted temporal characteristics of execution of his program on MPP or workstation cluster in more or less detail.
The performance of parallel programs on multiprocessor computers with distributed memory is determined by the following major factors:
-
program parallelism - a part of parallel calculations in the total volume of calculations;
-
balance of processor load during parallel calculations;
-
time needed for execution of interprocessor communications;
-
degree of overlapping of interprocessor communications with calculations.
The opportunity to distinguish sequential and parallel parts of the program during its execution on the multiprocessor computer allows the predictor to give a user the following basic parameters of the parallel program execution:
-
execution time;
-
efficiency coefficient;
-
lost time.
Execution time is the maximum from the times of the program execution on each processor.
To calculate the main characteristic of parallel execution (efficiency coefficient) it is necessary to calculate two amount of time. First, a productive time required for the program execution on serial computer. Second, a total processor time, calculated as a product of execution time by number of processors. Efficiency coefficient is a ratio of the productive time to the total processor time.
The lost time is the total processor time of parallel execution subtracted by the productive time. If the programmer is not satisfied with efficiency coefficient he should analyze components of the lost time and their origin.
There are following components of the lost time:
-
Losses because of insufficient parallelism that causes replication of execution on several processors (insufficient parallelism);
-
Losses because of execution of interprocessor communication (communication);
-
Losses because of idle time of the processors on which the program execution has been completed earlier than on other (idle).
To estimate the total potential losses, which can arise because of non-simultaneous start of collective operations on different processors, the special characteristic - synchronization should be calculated and given the user. A main reason of these losses is imbalance of processors loading during execution of parallel loops. For estimating the potential losses because of imbalance, the generalized characteristic imbalance should be given the user. Dissynchronization can occur not only due to imbalance, but also due to differences in completion times of collective operations on different processors. To evaluate the potential dissynchronization the programmer is provided with a special characteristic – time variation of collective operation completion.
An important characteristic, showing the degree of overlapping of interprocessor communications with calculations, is the time of overlapping.
For more detail analysis of the program efficiency user can apply special language features to split program into intervals and to get performance characteristics for each interval.
1.2.The Content of Predictor
The predictor is the system for processing the trace information gathered by LIB-DVM system during DVM-program execution on a workstation (in case of several workstations there is special utility to unite several traces in one). This system using trace information and parameters given by the user calculates and produces for him the predicted temporal characteristics of the program execution on MPP or on workstation cluster, using the library, which simulates a parallel execution of DVM-programs. We will be referred to this library as the LIB-DVM-simulator.
2.The Lib-DVM simulator
2.1.The LIB-DVM simulator overview
The LIB-DVM simulator is the class library of main objects used in LIB-DVM. The member functions of these classes simulate appropriate functions LIB-DVM with the purpose of determining times spent on interprocessor communications and finding parallel loop mapping onto the processor system.
2.2.The base functions of the LIB-DVM simulator module
2.2.1.Processor system defining
VM::VM( long ARank, long* ASizeArray, int AMType, double ATStart, double ATByte );
ARank – a rank of the given processor system;
ASizeArray – vector of the sizes of the given processor system dimensions. ASizeArray[i] is a size of the i+1 dimension (0 i ARank – 1);
AMType – the type of the distributed processor system (0 – network with bus architecture, 1 – matrix of processors);
ATStart – communication operation start time;
ATByte – transfer time of one byte;
2.2.2.Template creating
AMView::AMView( long ARank, long *ASizeArray );
ARank – a rank of created template;
ASizeArray – vector of the sizes of the template dimensions. ASizeArray[i] is a size of the i+1 dimension (0 i ARank – 1);
2.2.3.Mapping template
void AMView::DisAM ( ImLast000000000000000000000000000000000000000VM *AVM_Dis, long AParamCount, long *AAxisArray, long *ADistrParamArray );
AVM_Dis – pointer to the processor system, on which is mapped the template.
AParamCount – the number of parameters defined in arrays AAxisArray and ADistrParamArray.
AAxisArray – AAxisArray[j] is a dimension number of the template used in mapping rule for processor system (j+1)–th dimension.
ADistrParamArray – ADistrParamArray[j] is a mapping rule parameter for processor system (j+1)th dimension (ADistrParamArray[j]³0).
The function creates regular mapping of the template onto processor system.
double AMView::RDisAM( long AParamCount, long *AAxisArray, long *ADistrParamArray, long ANewSign );
AParamCount – the number of parameters, defined in arrays AAxisArray and ADistrParamArray.
AAxisArray – AAxisArray[j] is a dimension number of the template used in the mapping rule for processor system (j+1)th dimension.
ADistrParamArray – ADistrParamArray[j] is a mapping rule parameter for processor system (j+1)th dimension (ADistrParamArray[j]³0).
ANewSign – the flag that defines whether to save contents of realigned arrays or not.
The function determines time spent on communication during remapping template.
The function returns required time.
2.2.4.Distributed array creating
DArray::DArray( long ARank, long *ASizeArray, long ATypeSize );
ARank – rank of the created array.
ASizeArray – ASizeArray[i] is a size of the (i+1)th dimension of the created array (0 i ARank – 1).
ATypeSize – size in bytes of one array element.
2.2.5.Mapping distributed array
void DArray::AlnDA( AMView *APattern, long *AAxisArray, long *ACoeffArray, long *AConstArray );
void DArray::AlnDA( DArray *APattern, long *AAxisArray, long *ACoeffArray, long *AConstArray );
APattern – pointer to the alignment pattern.
AAxisArray – AAxisArray[j] is a dimension number of the distributed array used in the linear alignment rule for the pattern (j+1)th dimension.
ACoeffArray – ACoeffArray[j] is a coefficient for distributed array index variable used in the linear alignment rule for the pattern (j+1)th dimension.
AConstArray – AConstArray[j] is a constant used in the linear alignment rule for the pattern (j+1)th dimension.
The function aligns the distributed array.
double DArray::RAlnDA( AMView *APattern, long *AAxisArray, long *ACoeffArray, long *AConstArray, long ANewSign );
double DArray::RAlnDA( DArray *APattern, long *AAxisArray, long *ACoeffArray, long *AConstArray, long ANewSign );
APattern – pointer to the alignment pattern.
AAxisArray – AAxisArray[j] is a dimension number of the distributed array used in the linear alignment rule for the pattern (j+1)th dimension.
ACoeffArray – ACoeffArray[j] is a coefficient for distributed array index variable used in the linear alignment rule for the pattern (j+1)th dimension.
AConstArray – AConstArray[j] is a constant used in the linear alignment rule for the pattern (j+1)th dimension.
ANewSign – the flag of updating of the distributed array.
The function determines time spent on communications during realigning the distributed array.
The function returns required time.
2.2.6.Parallel loop defining
ParLoop::ParLoop( long ARank );
ARank – rank of the parallel loop.
The function creates parallel loop.
void ParLoop::MapPL( AMView *APattern, long *AAxisArray, long *ACoeffArray, long *AConstArray, long *AInitIndexArray, long *ALastIndexArray, long *AStepArray );
void ParLoop::MapPL( DArray *APattern, long *AAxisArray, long *ACoeffArray, long *AConstArray, long *AInitIndexArray, long *ALastIndexArray, long *AStepArray );
APattern – pointer to the pattern of the parallel loop mapping.
AAxisArray – AAxisArray[j] is a dimension number of the parallel loop (that is the number of the index variable) used in linear alignment rule for the pattern (j+1)th dimension.
ACoeffArray – ACoeffArray[j] is a coefficient for the parallel loop index variable used in linear alignment rule for the pattern (j+1)th dimension.
AConstArray – AConstArray[j] is a constant used in the linear alignment rule for the pattern (j+1)th dimension.
AInitIndexArray – AInitIndexArray[i] is an initial value for the index variable of the parallel loop (i+1)th dimension.
ALastIndexArray – ALastIndexArray[i] is a last value for the index variable of the parallel loop (i+1)th dimension.
AStepArray – InStepArray[i] is a step value for the index variable of the parallel loop (i+1)th dimension.
The function creates regular mapping of the parallel loop onto the template.
void ParLoop::ExFirst( ParLoop *AParLoop, BoundGroup *ABoundGroup)ImLast000000000000000000000000000000000000000;
AParLoop – pointer to the parallel loop.
ABoundGroup – pointer to the shadow edge group, which will be renewed after computing the exported elements from the local parts of the distributed arrays.
The function sets the flag of change the execution order of the parallel loop iterations.
void ParLoop::ImLast( ParLoop *AParLoop, BoundGroup *ABoundGroup)ImLast000000000000000000000000000000000000000;
AParLoop – pointer to the parallel loop.
ABoundGroup – pointer to the shadow edge group, which will be renewed after computing the exported elements from the local parts of the distributed arrays.
The function sets the flag of change the execution order of the parallel loop iterations.
2.2.7.Loading buffers by the remote array elements
friend double ArrayCopy( DArray *AFromArray, long *AFromInitIndexArray, long *AFromLastIndexArray, long *AFromStepArray, DArray *AToArray, long *AToInitIndexArray, long *AToLastIndexArray, long *AToStepArray, long ACopyRegim );
AFromArray – pointer to the source distributed array.
AFromInitIndexArray – AFromInitIndexArray[i] is the initial index value of the (i+1)th dimension of the source array.
AFromLastIndexArray – 00000000000000000000000000000000000000000000FromLastIndexArray[i] is the last index value of the (i+1)th dimension of the source array.
AFromStepArray – AFromStepArray[i] is the step of the index of the (i+1)th dimension of the source array.
AToArray – pointer to the target distributed array.
AToInitIndexArray – AToInitIndexArray[i] is the initial index value of the (i+1)th dimension of the target array.
AToLastIndexArray – AToLastIndexArray[i] is the last index value of the (i+1)th dimension of the target array.
AToStepArray – 00000000000000000000000000000000000000000000ToStepArray[i] is the step of the index of the (i+1)th dimension of the target array.
Характеристики
Тип файла документ
Документы такого типа открываются такими программами, как Microsoft Office Word на компьютерах Windows, Apple Pages на компьютерах Mac, Open Office - бесплатная альтернатива на различных платформах, в том числе Linux. Наиболее простым и современным решением будут Google документы, так как открываются онлайн без скачивания прямо в браузере на любой платформе. Существуют российские качественные аналоги, например от Яндекса.
Будьте внимательны на мобильных устройствах, так как там используются упрощённый функционал даже в официальном приложении от Microsoft, поэтому для просмотра скачивайте PDF-версию. А если нужно редактировать файл, то используйте оригинальный файл.
Файлы такого типа обычно разбиты на страницы, а текст может быть форматированным (жирный, курсив, выбор шрифта, таблицы и т.п.), а также в него можно добавлять изображения. Формат идеально подходит для рефератов, докладов и РПЗ курсовых проектов, которые необходимо распечатать. Кстати перед печатью также сохраняйте файл в PDF, так как принтер может начудить со шрифтами.















