CDVM2 (1158340), страница 4
Текст из файла (страница 4)
Example 5.1. Distribution of loop iterations with regular computations.
DVM(DISTRIBUTE B [BLOCK ][BLOCK ]) float B[N][M+1];
DVM(ALIGN [i][j] WITH B[i][ j+1]) float A[N][M], C[N][M], D[N][M];
. . .
DVM(PARALLEL [i][j] ON B[i][j+1] )
DO(i, 0, N-1, 1)
DO(j, 0, M-2, 1)
{
A[i][j] = D[i][j] + C[i][j];
B[i][j+1] = D[i][j] – C[i][j];
}
The loop satisfies to all requirements of a distributed loop. In particular, left sides of assignment statements of one loop iteration A[i][j] and B[i][j+1] are allocated on one processor through alignment of arrays А and В.
If left sides of assignment operators are located on the different processors (distributed iteration of the loop) then the loop must be split on several loops.
5.2.3. Private variables.
Consider the following example declaring a private variable, i.e. one undefined before any iteration and unused after any iteration. Such a declaration prevents two possible parallelization problems: iteration dependency and inconsistency of replicated variable’s value.
Example 5.3. Specification of private variable.
DVM(PARALLEL [i]]j] ON A[i][j] )
FOR(i, N)
FOR(j, N)
{float x; /* variable declared as private for every iteration */
x = B[i][j] + C[i][j];
A[i][j] = x;
}
5.2.4. Reduction operations and variables. REDUCTION specification
Programs often contain loops with so called reduction operations: array elements are accumulated in some variable, minimum or maximum value of them is determined. Such loops are not independent but nevertheless may be distributed using REDUCTION specification. This specification means that iterations, local on each processor, evaluate partial (as a rule different) values of the variable, and after loop completion RTL reduces this local values to a consistent global value.
| reduction-clause | ::= REDUCTION reduction-op-string |
| reduction-op | ::= reduction-op-name ( reduction-variable ) | |
| | reduction-loc-name ( reduction-variable , int-variable-list) | ||
| reduction-variable | ::= array-name | |
| | scalar-variable-name | ||
| reduction-op-name | ::= SUM |
| | PRODUCT | |
| | MAX | |
| | MIN | |
| | AND | |
| | OR |
| reduction-loc-name | ::= MAXLOC |
| | MINLOC |
Distributed arrays cannot be used as reduction variables. Reduction variables are calculated and used inside the loop only in corresponding “reduction” statements.
Example 5.4. Specification of reduction.
S = 0;
X = A[0];
Y = A[0];
MINI = 0;
DVM(PARALLEL [i] ON A[i] ;
REDUCTION SUM(S) MAX(X) MINLOC(Y,MIMI))
FOR(i, N)
{
S = S + A[i];
X = max(X, A[i]);
if(A[i] < Y) {
Y = A[i];
MINI = i;
}
}
5.2.5 Distribution of the loop with regular data dependence. ACROSS specification
Consider the following loop
DO(i, 1, N-2,1)
DO(j, 1, N-2,1)
A[i][j] = (A[i][j-1] + A[i][j+1] + A[i-1][j] + A[i+1][j]) / 4. ;
Data dependence exists between loop index i1 and i2 ( i1<i2), if both these iterations refer to the same array element by write-read or read-write scheme.
If iteration i1 writes a value and iteration i2 reads this value, then flow dependence, or simply dependence exists between the iterations.
If iteration i1 reads the "old" value and iteration i2 writes the "new" value, then reverse dependence i1 i2 exists between the iterations.
In both cases iteration i2 can be executed only after iteration i1.
The value i2 - i1 is called a range or length of dependence. If for any iteration i dependent iteration i + d (d is constant) exists, then the dependence is called regular one, or dependence with constant length.
The loop with regular computations, with regular dependencies on distributed arrays, can be distributed with PARALLEL directive, using ACROSS specification.
| across-clause | ::= ACROSS dependent-array-string |
| dependent-array | ::= dist-array-name dependence-string |
| dependence | ::= [ flow-dep-length : anti-dep-length ] |
| flow-dep-length | ::= int-constant |
| anti-dep-length | ::= int-constant |
All the distributed arrays with regular data dependence are specified in ACROSS specification. The length of flow-dependence (flow-dep_length) and the length of anti-dependence (anti-dep-length) are specified for each dimension of the array. There is no data dependence, if length is equal to zero.
Example 5.5. Distribution of the loop with regular data dependence.
DVM(PARALLEL [i][j] ON A[i][j] ; ACROSS A[1:1][1:1])
DO(i , 1, N-2, 1)
DO(j , 1, N-2, 1)
A[i][j] = (A[i][j-1] + A[i][j+1] + A[i-1][j] + A[i+1][j]) / 4. ;
Flow- and anti-dependencies exist for both dimensions of the array A.
5.3. Distribution of the loop iterations with irregular computations.
5.3.1. Model of irregular computations
Irregular computations dominate in such applications, as transformation of stretched matrixes, calculations on irregular (non-structured) grids, simulation of particle interaction (N-body problem) and so on.
Represent model of irregular computation as a graph G(V,E), where V is a set of graph vertexes, and E is a set of edges. Graph vertexes corresponds to data and graph edges corresponds to computations. The edge Ei, connecting vertexes Vi1 and Vi2, means that when computing data in one vertex, data from the second vertex are used.
Introduce a numeration of graph nodes and represent nodes as one-dimensional array V(NV), where NV is a number of graph vertexes. Two models of realization of irregular computations are possible: graph edges covering and graph vertexes covering. In data parallelism model graph vertexes covering is most optimal algorithm.[3]
To implement this model represent edges as index matrix ME[NV][MAXE+1], where MAXE is maximal number of edges, originated from one graph vertex. The element ME[i][0] contains the number of edges, originated from i-th vertex. The elements M[i][j] for 1<j ME[i][0] contains node numbers of the edges, originated from i-th node.
The following program fragment represents a loop of graph vertex covering.
Example 5.6. Graph vertex covering.
float V1[NV], V2[NV];
int ME[NV][MAXE+1]
for (I=0; I<NV; I++)
for (J=1; J<ME[I][0]; J++)
V1[I] = V1[I] + V2[ME[I][J]];
5.3.2. Distributed loop with irregular computations
Distributed loop with irregular computations must satisfy to the following requirements:
-
PARALLEL directive is applied only to one loop header (one-dimensional loop);
-
there is no data dependence except reduction dependence;
-
distributed arrays, used in the loop, have only one distributed dimension;
-
a loop iteration with index I indexes distributed dimensions only by the following expressions:
a*I + b, where a and b are not changed in the loop (invariants),
ME[I][J], where ME is an index matrix, and J is sequential loop index n,
where n = ME[I][J]
-
there are no I/O statements and REALIGN and REDISTRIBUTE directives in the loop body.
To distribute the loop with irregular computations the PARALLEL directive with additional INDIRECT_ACCESS specification (see section 6.2) is used.
Example 5.7. Distribution of the loop with non-regular computations.
DVM(DISTRIBUTE [BLOCK]) float V1[NV];
DVM(ALIGN [i] WITH V1[i]) float V2[NV];
DVM(ALIGN [i][] WITH V1[i]) int ME[NV][MAXE+1];
. . .
DVM(PARALLEL [I] ON V1[I]; INDIRECT_ACCESS V2[ME[I][J]])
FOR(I, NV)
DO(J,1, ME[I][0],1)
{
K = M[I][J];
V1[I] = V1[I] + V2[ME[I][J]];
V2[I] = V2[I] + V1[K];
}
5.4. Non-distributed (replicated) computations
Define statement of own computations as follows
A[k1]..[kn] = e
| where A[k1]..[kn] | - distributed array element; kj is index expression, which doesn't contain variables of distributed loops. |
| e | - expression upon data, localized on the processor, where element A[k1]..[kn] is allocated |
The statement of own computations must be specified by OWN directive.
| own-directive | ::= OWN |
The statement will be executed only on the processor, where element A[k1]...[kn] is allocated.
The computations, except own computation statements, not satisfying to distributed loop requirements, are performed on all the processors of the current subsystem. It is necessary to distinguish the following kinds of replicated computations.
-
Sequential loop, nested into the distributed loop, can index replicated data and local dimensions of distributed arrays only.
-
Sequential loop, nesting the distributed loop, can index:
-
inside the distributed loop - replicated data and local dimensions of distributed arrays only;
-
outside the distributed loop - replicated data and elements of distributed arrays in own computation statements.
-
Sequential loop outside the distributed loop can index replicated data only. If a distributed array is used in the sequential loop, it is necessary to replicate it before the loop execution by REDISTRIBUTE []...[] or REALIGN []...[] directives. When leaving the loop a user can to restore necessary distribution of the arrays by REDISTRIBUTE and REALIGN directives.
-
Only replicated data and elements of distributed arrays can be used in own computation statements of sequential computations between upper level loops.
Thus statements of own computations can be used in sequential computations outside the distributed loop.
Example 5.8. Specification of own computations.
#define N 100
DVM(DISTRIBUTE [BLOCK][]) float A[N][N+1];
DVM(ALIGN [I] WITH A[I][N+1]) float X[N];
. . .
/* reverse substitution of Gauss algorithm */
/* own computations outside the loops */
DVM(OWN) X[N-1] = A[N-1][N] / A[N-1][N]
DO(J, N-2,0, -1)
DVM(PARALLEL [I] ON A [I][]; REMOTE_ACCESS X[j+1])
DO(I,0, J,1)
A[I][N] = A[I][N] – A[I][J+1] * X[J+1];
/* own computations in sequential loop, */
/* nesting the distributed loop */















