cdvmLDe (1158334), страница 5
Текст из файла (страница 5)
Two buffers, that are continuous prolongation of the array local section, are allocated on each processor. The low shadow edge width is equal to 1 element (for B[I-1]), high shadow edge width is equal to 2 elements (B[I+1] and B[I+2]). If before loop entering to perform processor exchange according to scheme on fig. 6.1, the loop can be executed on each processor without replacing references to the arrays by references to the buffer.
Shadow edges for multidimensional distributed arrays can be specified for each dimension. A special case is when it is required to renew "a corner" of shadow edges. In such a case additional parameter CORNER is needed.
Example 6.2. Specification of SHADOW-references with corner elements.
DVM(DISTRIBUTE [BLOCK]) float A[100];
DVM(DISTRIBUTE [BLOCK][BLOCK]) float A[100][100];
DVM(ALIGN [i][j] WITH A[i][j]) float B[100][100];
. . .
DVM(PARALLEL[I][J] ON A[I][J]; SHADOW_RENEW B CORNER)
DO( I, 1, 98, 1)
DO( J, 1, 98, 1)
A[I][J] = (B[I][J+1] + B[I+1][J] + B[I+1][J+1]) / 3.;
The widths of shadow edges of the array B are equal to 1 element for all dimensions by default. As "corner" reference B[I+1][J+1] exists, the CORNER parameter is specified.
| shadow edges | |||||||||
| sent values | |||||||||
| internal area | |||||||||
| corner elements |
Fig 6.2. Scheme of array local section with shadow edges.
6.2.3ACROSS specification of dependent references of SHADOW type for single loop
Consider the following loop
DO(i, 1, N-2,1)
DO(j, 1, N-2,1)
A[i][j] =(A[i][j-1]+A[i][j+1]+A[i-1][j]+A[i+1][j])/4.;
The dependence of A array data (informational linkage) exists between loop iterations with indexes i1 and i2 ( i1<i2), if both iterations refer to the same array element by write-read or read-write scheme.
If iteration i1 writes a value and iteration i2 reads this value, then flow dependence, or simply dependence i1 i2 exists between the iterations.
If iteration i1 reads the "old" value and iteration i2 writes the "new" value, then reverse (anti) dependence i1 i2 exists between the iterations.
In both cases the iteration i2 can be executed only after the iteration i1.
The value i2 - i1 is named a range or length of the dependence. If for any iteration i dependent iteration i + d (d is constant) exists, then the dependence is called regular one, or dependence with constant length.
The loop with regular computations, with regular dependencies on distributed arrays, can be distributed with PARALLEL directive, using ACROSS specification.
across-clause | ::= ACROSS dependent-array... | |
dependent-array | ::= dist-array-name dependence... | |
dependence | ::= [ flow-dep-length : anti-dep-length ] | |
flow-dep-length | ::= int-constant | |
anti-dep-length | ::= int-constant |
All the distributed arrays with regular data dependence are specified in ACROSS specification. The length of flow-dependence (flow-dep_length) and the length of reverse (anti) dependence (anti-dep-length) are specified for each dimension of the array. There is no data dependence, if length is equal to zero.
Example 6.3. Specification of the loop with regular data dependence.
DVM(PARALLEL [i][j] ON A[i][j] ; ACROSS A[1:1][1:1])
DO(i , 1, N-2, 1)
DO(j , 1, N-2, 1)
A[i][j]=(A[i][j-1]+A[i][j+1]+A[i-1][j]+A[i+1][j])/4.;
Flow- and anti-dependencies of lenght 1 exist for each dimension of the array A.
ACROSS specification is implemented via shadow edges. Anti-dependence length defines width of high edge renewing, and flow-dependence length defines width of low edge renewing. High edges are renewed prior the loop execution (as for SHADOW_RENEW directive). Low edges are renewed during the loop execution as remote data calculation proceeds. It allows to organize so called wave calculations for multidimensional arrays. Actually, ACROSS-references are subset of SHADOW-references, that have data dependences.
6.2.4Asynchronous specification of independent references of SHADOW type
Updating values of shadow edges, described in section 6.2.2, is indivisible (synchronous) exchange operation for unnamed group of distributed arrays. The operation can be divided into two operations:
-
starting exchange
-
waiting for values.
While waiting for shadow edge values, other computations can be performed, in particular, the computations on internal area of the local array section can be done.
The following directives describe asynchronous renewing of shadow edges for named group of distributed arrays.
Declaration of a group.
shadow-group-directive | ::= CREATE_SHADOW_GROUP shadow-group-name : renewee… |
Start of shadow edges renewing.
shadow-start-directive | ::= SHADOW_START shadow-group-name |
Waiting for shadow edges values.
shadow-wait-directive | ::= SHADOW_WAIT shadow-group-name |
SHADOW_START directive must be executed after CREATE_SHADOW_GROUP one. After CREATE_SHADOW_GROUP directive execution directives SHADOW_START and SHADOW_WAIT can be executed many times. Updated values of the shadow edges may be used only after SHADOW_WAIT directive.
A special case is using SHADOW_START and SHADOW_WAIT directives in specification shadow-renew-clause of parallel loop.
shadow-renew-clause | ::= . . . |
| shadow-start-directive | |
| shadow-wait-directive |
If SHADOW_START directive is specified in a parallel loop, the surpassing computation of the values, sent to the shadow edges. Then the shadow edges are renewed and the computation on internal area of the array local section is done (see fig. 6.2).
If SHADOW_WAIT directives are specified in a parallel loop, the surpassing computation of the values not using shadow edge elements is performed. Other elements are calculated only after completion of waiting for new values of shadow edges.
Example 6.4. Overlapping computations and shadow edges updating.
DVM(DISTRIBUTE [BLOCK][BLOCK]) float C[100][100];
DVM(ALIGN[I][J] WITH C[I][J]) float A[100][100], B[100][100],
D[100][100];
DVM(SHADOW_GROUP) void *AB;
. . .
DVM(CREATE_SHADOW_GROUP AB: A B);
. . .
DVM(SHADOW_START AB);
. . .
DVM(PARALLEL[I][J] ON C[I][J]; SHADOW_WAIT AB)
DO( I , 1, 98, 1)
DO( J , 1, 98, 1)
{ C[I][J] = (A[I-1][J]+A[I+1][J]+A[I][J-1]+A[I][J+1])/4.;
D[I][J] = (B[I-1][J]+B[I+1][J]+B[I][J-1]+B[I][J+1])/4.;
}
The shadow edge width of distributed arrays A and B is equal to 1 element for each dimension by default. Waiting for completion of shadow edges renewing is postponed as late as possible, that is, up to the moment when the computations can not be continued without them.
6.3Remote references of REMOTE type
6.3.1REMOTE_ACCESS directive
Remote references of REMOTE type is specified by REMOTE_ACCESS directive.
remote-access-directive | ::= REMOTE_ACCESS |
regular-reference | ::= dist-array-name [ regular-subscript ]… |
regular-subscript | ::= [ int-expr ] |
| [ do-variable-use ] | |
| [] | |
remote-access-clause | ::= remote-access-directive |
REMOTE_ACCESS directive can be used as a separate directive prior to own computation statement (its operating area - next statement) or as additional specification in PARALLEL directive (its operating area - parallel loop body).
If remote reference is specified as array name without index list, then all references to the array in a parallel loop (in a statement) are remote references of REMOTE type.
6.3.2Synchronous specification of remote references of REMOTE type
If in REMOTE_ACCESS directive a group name (remote-group-name) is not specified the directive is executed in synchronous mode. In boundaries of a statement or a parallel loop below the compiler replaces all remote references by the references to a buffer. The values of remote variable are passed prior the statement or the loop execution.
Example 6.5. Synchronous specification of remote references of REMOTE type.
DVM(DISTRIBUTE [][BLOCK]) float A[100][100], B[100][100];
. . .
DVM(REMOTE_ACCESS A[50][50]) X = A[50][50];
. . .
DVM(REMOTE_ACCESS B[100][100]) {A[1][1] = B[100][100];}
. . .
DVM(PARALLEL[I][J] ON A[I][J]; REMOTE_ACCESS B[][N])
FOR(I, 100)
FOR(J, 100)
A[I][J] = B[I][J] + B[I][N];
Two first REMOTE_ACCESS directives specify remote references for own computation statements. REMOTE_ACCESS directive in parallel loop specifies remote data (matrix column) for all processors the array A is mapped on.
6.3.3Asynchronous specification of remote references of REMOTE type
If in REMOTE_ACCESS directive a group name (remote-group-name) is specified the directive is executed in asynchronous mode. To specify this mode following additional directives are required.
Group name definition.
remote-group-directive | ::= REMOTE_GROUP |
The identifier, defined in the directive, can be used only in REMOTE_ACCESS, PREFETCH and RESET directives.
prefetch-directive | ::= PREFETCH group-name |
reset-directive | ::= RESET group-name |
Consider the following typical asynchronous specification sequence of remote references of REMOTE type:
DVM(REMOTE_GROUP) void * RS;
. . .
DVM(PREFETCH RS);
. . .
DVM(PARALLEL . . . ; REMOTE_ACCESS RS : r1)
. . .
DVM(PARALLEL . . . ; REMOTE_ACCESS RS : rn)
. . .
When given sequence of statements is executed first time, PREFETCH directive is not executed. REMOTE_ACCESS directives are executed in usual synchronous mode. At that the references are accumulated in RS variable. After execution of all sequence of REMOTE_ACCESS directives the value of RS variable is union of subgroups of remote references ri , … , rn.
When the sequence is executed the second and next time, PREFETCH directive performs surpassed sending of remote data for all the references, contained in RS variable. After PREFETCH directive and up to first executable REMOTE_ACCESS directive other computations overlapping waiting for remote reference processing, can be performed. At that REMOTE_ACCESS directives don't send any data.