rtsIDe (1158448), страница 10
Текст из файла (страница 10)
Note, that Aj and Bj have to meet the following requirements:
0 Aj*MAXk + Bj MAXj and 0 Aj*MINk + Bj MAXj , where:
| MAXj | | maximum of the index variable of the pattern j-th dimension; |
| MAXk | | maximum of the index variable of the parallel loop k-th dimension; |
| MINk | | minimum of the index variable of the parallel loop k-th dimension. |
2. Fj(I1, ... ,In) = { q Mj: 0 q MAXj } , where:
| Mj | | range of values of the index value of the pattern j-th dimension; |
| MAXj | | maximum of the index variable of the pattern j-th dimension. |
This mapping rule means that for each element (i1,…, in) of index space of the mapping loop a corresponding set consists of whole range of values of the index variable of the pattern j-th dimension. In such a case, the symbol "*" ("any of the admissible") is usually used.
As a result of given parallel loop mapping by mappl_ function on specified (directly on indirectly) abstract machine representation each loop iteration will be matched with abstract machine set, which is the iteration image for F mapping rule, considered above. On entry in each iteration all corresponding abstract machines become current ones (each on its own processor). So generally (because of F function multivalue) the execution of the parallel branch, represented by the loop iteration, is performed by several parallel subtasks (see section 10). As processor systems of these subtasks can be different, different ways of further branching (in the parallel loop iteration) are possible. To avoid such replication of parallel loop iterations on different subtasks, it is recommended to avoid without necessity loop iteration replication on dimensions of parental abstract machine representation, i.e. to not apply second rule from coordinate mapping rules, considered above. For this purpose for example, subsidiary representation of parental abstract machine can be created. This representation is of lesser rank and the parallel loop is mapped on the representation without iteration replication.
Note, that although, conceptually, on entry in parallel branch the current abstract machine must be replaced by corresponding descendant abstract machine, actually, (for overheads decreasing) the current abstract machine replacing is performed not when loop iteration is entered, but only if it is necessary (i.e. when those Run-Time System functions are called, which require existence of the current abstract machine as program object, for example, when the reference to the current abstract machine is requested, input/output in parallel loop iteration is performed and so on).
Examples.
-
Mapping rule F( (I1,I2) ) = {I1} {I2} means that the iteration (i1, i2) of two-dimensional loop has to be executed on the processor, an element (i1, i2) of two-dimensional pattern is mapped on.
-
Alignment rule F( (I1,I2) ) = {*} {I1+5} {*} means that the iteration of two-dimensional loop has to be executed on the processor, the element of three-dimensional pattern is mapped on, if the index of the pattern second dimension is equal to the index of the mapped loop first dimension plus 5. The index of the mapped loop second dimension and the indexes of the pattern first and third dimension are not considered.
-
Alignment rule F( (I1,I2,I3) ) = {*} {*} means, that each iteration of three-dimensional loop has to be executed on each processor, any element of two-dimension pattern is mapped on. The indexes of the mapped loop and of the pattern are not considered.
-
Alignment rule F( (I1,I2) ) = {0} {1} {2} means, that each iteration of two-dimensional array has to be executed on each processor, the element (0,1,2) of three-dimensional pattern is mapped on. The indexes of mapped loop are not considered.
Defining loop mapping onto pattern space (that is defining F1, ... ,Fm functions) has to meet requirement that all iterations of the loop have to be a part of the pattern space. Observance of both: the correct mapping of the loop and the correct mapping of the pattern guarantees the correct final distribution of the loop iterations over the processors. Note, that if the pattern is not an abstract machine representation, then mapping the loop onto the abstract machine representation is a superposition of mapping the loop onto the pattern space and the alignment of the pattern with the abstract machine (see section 7.2).
When the function mappl_ is called the parameters of the mapping rule Fj(I1,...,In)= {Aj*Ik + Bj} for the j-th pattern dimension have to be defined as follows:
AxisArray[j-1] contains value k;
CoeffArray[j-1] contains value Aj;
ConstArray[j-1] contains value Bj.
To specify the rule Fj(I1, ... ,In) with the image in a set of all values of index variable of pattern j-th dimension for any I1, ... ,In, the value AxisArray[j-1] (k value) has to be equal to -1. The values CoeffArray[j-1] and ConstArray[j-1] are irrelevant in this case.
The number of the mapping rules has to be equal to the rank of the pattern, when the function mappl_ is called.
The function returns non-zero value, if mapped parallel loop has a local part on the current processor, and zero otherwise.
9.3Reordering parallel loop execution
| long exfrst_ ( | LoopRef | *LoopRefPtr, |
| *LoopRefPtr | | reference to the parallel loop. |
| *ShadowGroupRefPtr | | reference to the shadow edge group, which will be renewed after computing the exported elements ærom the local parts of the distributed arrays. |
The function exfrst_ sets the following order of the parallel loop iterations. First, exported elements (elements-originals, that are the images of elements of shadow edges of) of the distributed array local parts) have been computed. Then the shadow edge group renewing (shadow elements update) has been started. At last the internal elements of the local parts of the distributed arrays have been computed (внутренние точки это локальная часть распределённого массива без экспортируемых элементов). After computing of the internal points, the automatic waiting for the completion of the shadow edge renewing is not performed.
The iteration execution order described above is implemented by partitioning iterations by 2*n+1 parts (portions) (n is the loop rank) on each processor. Information about each iteration portion is requested by dopl_ function (see section 9.4) and during its execution is put in OutInitIndexArray, OutLastIndexArray and OutStepArray arrays that are mappl_ function parameters. Invoking exfrst_ function causes following order of iteration portion requesting by dopl_ function: first 2*n requests put in OutInitIndexArray, OutLastIndexArray and OutStepArray arrays the information about the loop iterations, corresponding to exported elements of distributed arrays; last positive request corresponds to internal elements of distributed arrays.
The function exfrst_ call must precede to parallel loop mapping by mappl_ function.
The shadow edge group, specified by *ShadowGroupRefPtr reference, must be created in the current subtask.
When partitioning parallel loop execution by parts using exfrst_ function, result shadow edge widths of distributed arrays, necessary for iteration portion creation, are calculated in the following way (the calculation is performed by parallel loop mapping function mappl_). Let ShadowArrayGroup be a set of distributed arrays with shadow edges, included in the group, specified by *ShadowGroupRefPtr reference. Let also AMView be abstract machine representation, the parallel loop is (directly or indirectly) mapped on, and PLAxis be the loop dimension, mapped on AMVAxis dimension of the representation. Then the result low (high) shadow edge width for PLAxis loop dimension is equal to maximal value of low (high) shadow edge widths of those dimensions of the arrays from ShadowArrayGroup set, which are aligned with AMVAxis dimension of the representations, equivalent to AMView representation.
If Run-Time System didn't find any distributed array from ShadowArrayGroup set which dimension is aligned with AMVAxis dimension of the representation, equivalent to AMView representation, the result low and high shadow edge widths of PLAxis dimension will be set to zero. Low and high edge widths of parallel loop dimensions, not mapped on any dimension of AMView representation, will be also set to zero.
For more detail description of the shadow edges of the distributed arrays see section 12.
The function returns zero.
Note. The representations of the same abstract machine are equivalent, if they:
-
have the same rank and every dimension sizes;
-
are same mapped on equivalent processor subsystems.
Processor subsystems of the same processor system are equivalent, if they:
-
have the same rank and every dimension sizes;
-
consist of the same processors.
| long imlast_ ( | LoopRef | *LoopRefPtr, |
| *LoopRefPtr | | reference to the parallel loop. |
| *ShadowGroupRefPtr | | reference to the shadow edge group, which renewing completion the Run-Time System awaits after the computation of the internal points of the local parts of the distributed arrays. |
The function imlast_ sets the following order of the parallel loop iterations. First the internal points of the local parts of the distributed arrays have been computed Then Run-Time System awaits the completion of the shadow edge renewing of the specified group.At last the exported elements (elements-originals) of the local parts of the distributed arrays have been computed. After computing of the internal points, the automatic starting the shadow edge renewing is not performed.
As for exfrst_ function required order of iteration execution is implemented by partitioning iterations by 2*n+1 parts (portions) (n is the loop rank) on each procåssor. Information about each iteration portion is requested by dopl_ function (see section 9.4) and during its execution is put in OutInitIndexArray, OutLastIndexArray and OutStepArray arrays that are mappl_ function parameters. Invoking exfrst_ function causes following order of iteration portion requesting by dopl_ function: first request puts in OutInitIndexArray, OutLastIndexArray and OutStepArray arrays the information about loop iterations, corresponding to internal elements of distributed arrays; next 2*n requests correspond to exported elements of distributed arrays.
The function imlast_ must precede parallel loop mapping by mappl_ function.
The shadow edge group, specified by *ShadowGroupRefPtr reference, must be created in the current subtask.
Calculation of result shadow edge widths of distributed arrays, required for iteration portion creation is equivalent to their calculation, described above in the exfrst_ function description.
The function returns zero.
9.4Inquiry of continuation of parallel loop execution
long dopl_ (LoopRef *LoopRefPtr);
*LoopRefPtr reference to the parallel loop.















