cdvmLDe (1158334), страница 4
Текст из файла (страница 4)
PARALLEL directive is placed before loop header and distributes loop iterations in accordance with array or template distribution. The directive semantics is similar to semantics of ALIGN directive, where index space of distributed array is replaced by the loop index space. The order of loop indexes in list loop-variable... corresponds to the order of corresponding DO statements in tightly nested loop.
Syntax and semantics of the parts of the directive are described in the sections:
reduction-clause | section 5.1.4, |
shadow-renew-clause | section 6.2.2, |
remote-access-clause | section 6.3.1, |
across-clause | section 6.2.3. |
Example 5.1. Distribution of loop iterations with regular computations.
DVM(DISTRIBUTE B [BLOCK ][BLOCK]) float B[N][M+1];
DVM(ALIGN [i][j] WITH B[i][j+1]) float A[N][M], C[N][M], D[N][M];
. . .
DVM(PARALLEL [i][j] ON B[i][j+1])
DO(i, 0, N-1, 1)
DO(j, 0, M-2, 1)
{
A[i][j] = D[i][j] + C[i][j];
B[i][j+1] = D[i][j] – C[i][j];
}
The loop satisfies to all requirements of a parallel loop. In particular, left sides of assignment statements of one loop iteration A[i][j] and B[i][j+1] are allocated on one processor through alignment of arrays A and B.
If left sides of assignment operators are located on the different processors (distributed iteration of the loop) then the loop must be split on several loops.
5.1.3Private variables
In following example a variable is described inside loop. It is, so called, a private variable, i.e. its value is inessensial at iteration beginning and unused after the iteration. The variables described outside the loop must not be used in such manner because of two possible problems of parallel execution of the loop iterations: data dependency between iiterations and inconsistent state of the variable after leaving the loop.
Example 5.3. Declaration of private variable.
DVM(PARALLEL [i]]j] ON A[i][j] )
FOR(i, N)
FOR(j, N)
{float x; /* variable declared as private for every iteration */
x = B[i][j] + C[i][j];
A[i][j] = x;
}
5.1.4Reduction operations and variables. REDUCTION specification
Programs often contain loops with so called reduction operations: array elements are accumulated in some variable, minimum or maximum value of them is calculated. Iterations of such loops may be distributed using REDUCTION specification.
reduction-clause | ::= REDUCTION reduction-op... |
reduction-op | ::= reduction-op-name ( reduction-variable ) |
| reduction-loc-name ( reduction-variable , loc-variable) | |
reduction-variable | ::= array-name |
| scalar-variable-name | |
reduction-op-name | ::= SUM |
| PRODUCT | |
| MAX | |
| MIN | |
| AND | |
| OR | |
reduction-loc-name | ::= MAXLOC |
| MINLOC |
Distributed arrays cannot be used as reduction variables. Reduction variables are calculated and used only in certain statements - reduction statements.
The second argument of the MAXLOC and MINLOC operations is a variable describing the location of the element with found maximal (and correspondently minimal) value. Usually, it is an index of one-dimensional array element or a structure, containing index values of multi-dimensional array.
Example 5.4. Specification of reduction.
S = 0;
X = A[0];
Y = A[0];
MINI = 0;
DVM(PARALLEL [i] ON A[i];
REDUCTION SUM(S) MAX(X) MINLOC(Y,MIMI))
FOR(i, N)
{
S = S + A[i];
X = max(X, A[i]);
if(A[i] < Y) {
Y = A[i];
MINI = i;
}
}
5.2Calculations outside parallel loop
The calculations outside a parallel loop is performed according to own computation rule. Asignment statement
lh = rh;
can be executed on some processor only if lh is located at this processor. If lh is an distributed array element (and is not located on all the processors), then the statement (own computation statement) will be executed only on the processor (or on the processors), where given element is allocated. All data, used in rh expresions, must be located on the processor. If some data from expressions lh and rh are not located on the processor, they must be specified in remote access directive (see section 6.1.2) prior the statement.
If lh is reference to distributed array A, and data dependence between rh and lh exists, it is necessary to replicate distributed array by REDISTRIBUTE A[]...[] or REALIGN A[]...[] directive.
Example 5.5. Own computations.
#define N 100
DVM(DISTRIBUTE [BLOCK][]) float A[N][N+1];
DVM(ALIGN [I] WITH A[I][N+1]) float X[N];
. . .
/* reverse substitution of Gauss algorithm */
/* own computations outside the loops */
X[N-1] = A[N-1][N] / A[N-1][N]
DO(J, N-2,0, -1)
DVM(PARALLEL [I] ON A [I][]; REMOTE_ACCESS X[j+1])
DO(I,0, J,1)
A[I][N] = A[I][N] – A[I][J+1] * X[J+1];
/* own computations in sequential loop, */
/* surrounding the parallel loop */
X[J] = A[J][N] / A[J][J]
}
Note, that A[J][N+1] and A[J][J] are localized on the processor, where X[J] is allocated.
6Remote data specification
6.1Remote reference definition
Data, allocated on one processor and used on the other one, are called remote data. The references to such data are called remote references. Consider generalized statement
if (…A[inda]…) B[indb] = …C[indc]…
where
A, B, C - distriduted arrays,
inda, indb, indc – index expressions.
In DVM model this statement will be performed on the processor, where element B(indb) is allocated. A(inda) and C(indc) references are not remote references, if corresponding elements of arrays A and C are allocated on the same processor. This is guarantied only if A(inda), B(indb) and C(indc) are aligned in the same point of alignment template. If alignment is impossible, then the references A(inda) and/or C(indc) should be specified as remote references. In the case of multidimensional arrays this rule is applied to every distributed dimension.
By a degree of processing efficiency remote references are subdivided on two types: SHADOW and REMOTE.
If B and C arrays are aligned and
inda = indc d ( d – positive integer constant),
the remote reference C(indc) is SHADOW type reference. The remote reference to multidimensional array is SHADOW type reference, if distributed dimensions satisfy to SHADOW type definition.
Remote references that are not SHADOW type references, are the references of REMOTE type.
Special set of remote references is set of references to reduction variables (see section 5.2.4), that are REDUCTION type references. These references can be used in parallel loop only.
There are two kinds of specifications: synchronous and asynchronous for all types of remote references.
Synchronous specification defines group processing of all remote references for given statement or loop. During the processing, requiring interprocessor exchanges, the statement or the loop execution is suspended.
Asynchronous specification allows overlapping computations and interprocessor exchanges. It unites remote references of several statements and loops. To start reference processing operation and wait for its completion, special directives are used. Other computations, not containing references to specified variables, can be executed between these directives.
6.2Remote references of SHADOW type
6.2.1Specification of array with shadow edges
Remote reference of SHADOW type means, that remote data processing will be organized using shadow edges. Shadow edge is a buffer, that is continuous prolongation of array local section in the processor memory (see fig.6.1). Consider the following statement:
A[i] = B[i + d2] + B[ i – d1]
where d1,d2 are integer positive constants. If both referencies to array B are remote references of SHADOW type, then SHADOW [ d1 : d2] clause should be used for B array, where d1 is low edge width and d2 is high edge width. For multidimensional arrays edges for all dimensions should be specified. When shadow edges are specified, maximal width for all remote referencies of SHADOW type is defined.
Syntax of SHADOW directive.
shadow-directive | ::= SHADOW shadow-array... |
shadow-array | ::= array-name shadow-edge... |
shadow-edge | ::= [ width ] |
| [ low-width : high-width ] | |
width | ::= int-expr |
low-width | ::= int-expr |
high-width | ::= int-expr |
Constraint. Low shadow edge width (low-width) and high shadow edge width (high-width) must be non-negative integer constant expressions.
Specificating shadow edge width as width is equivalent to the specification width : width.
The width of the both shadow edges of a distributed array is equal to 1 for each distributed dimension by default.
6.2.2Specification of independent references of SHADOW type for one loop
Specification of shadow edge synchronous renewing is PARALLEL directive clause:
shadow-renew-clause | ::= SHADOW_RENEW renewee... | |
renewee | ::= dist-array-name [ shadow-edge ]… [ CORNER ] |
Constraints:
-
Width of renewed shadow edges must not exceed the maximal width specified in the SHADOW directive.
-
If shadow edge widths are not specified, then maximal widths are used.
Synchronous specification execution is renewing shadow edges by values of remote variables before entering the loop.
Example 6.1. Specification of SHADOW-references without corner elements.
DVM(DISTRIBUTE [BLOCK]) float A[100];
DVM(ALIGN[I] WITH A[I]; SHADOW [1:2]) float B[100];
. . .
DVM(PARALLEL[I] ON A[I]; SHADOW_RENEW B)
DO(I,1, 97,1)
A[I] = (B[I-1] + B[I+1] + B[I+2]) / 3.;
When renewing shadow edges the maximal widths 1:2 specified in SHADOW directive are used.
Distribution and scheme of renewing shadow edges is shown on fig. 6.1.
P-1 | P | P+1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
V | V | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Fig.6.1. Distribution of array with shadow edges.