fdvmLDe (1158420), страница 6
Текст из файла (страница 6)
By degree of processing efficiency remote references are subdivided on two types: SHADOW and REMOTE.
If B and C arrays are aligned and
inda = indc d ( d – positive integer constant),
then the remote reference C(indc) belongs to SHADOW type. Remote reference to multi-dimensional array belongs to SHADOW type, if distributed dimensions satisfy to SHADOW type definition.
Remote references, that don't belong to SHADOW type, are REMOTE type references.
Special set of remote references is set of references to reduction variables (see 5.2.4), that belongs to REDUCTION type. These references can be used in parallel loop only.
There are two kinds of specifications: synchronous and asynchronous for all types of remote references.
Synchronous specification defines group processing of all remote references for given statement or loop. During this processing, requiring communications, execution of the statement or the loop is suspended. Asynchronous specification allows overlapping computations and communications. It unites remote references of several statements and loops. To start reference processing operation and wait for its completion, special directives are used. Between these directives other computations, that don't contain references to specified variables, can be performed.
6.2SHADOW type references
6.2.1Specification of array with shadow edges
Remote reference of SHADOW type means, that remote data processing will be performed, using "shadow" edges. Shadow edge is a buffer, that is continuous prolongation of the array local section in the processor memory (see fig. 6.1). Consider following statement
A( i ) = B( i + d2) + B( i – d1)
where d1, d2 - integer positive constants. If both references to B array are remote references of SHADOW type, B array should be specified in SHADOW directive as B( d1 : d2 ), where d1 is low edge width, and d2 is high edge width. For multidimensional arrays the edges by each dimension should be specified. Maximal width for all remote references of SHADOW type is set in shadow edges specification.
SHADOW directive syntax.
| shadow-directive | is SHADOW dist-array ( shadow-edge-list ) |
| or SHADOW ( shadow-edge-list ) :: dist-array-list |
| dist-array | is array-name |
| or pointer-name |
| shadow-directive | is SHADOW shadow-array-list |
| shadow-array | is array-name ( shadow-edge-list ) |
| shadow-edge | is width |
| or low-width : high-width |
| width | is int-expr |
| low-width | is int-expr |
| high-width | is int-expr |
Constraint. The width of low shadow edge (low-width) and width of high shadow edge (high-width) must be integer non-negative constant expressions.
A specification of shadow edge width as width is equivalent to width : width specification.
The width of the both shadow edges of a distributed array is equal to 1 for each distributed dimension by default.
6.2.2Synchronous specification of independent references of SHADOW type for single loop
Synchronous specification is a clause in PARALLEL directive.
| shadow-renew-clause | is SHADOW_RENEW ( renewee‑list ) |
| renewee | is dist-array-name [ ( shadow-edge-list )] [ (CORNER) ] |
Constraints:
-
Width of the shadow edges filled by values must not exceed the maximal width specified initially in the SHADOW directive.
-
If shadow edge widths is not specified, then the maximal widths are used.
Synchronous specification performing is renewing shadow edges by the values of remote variables.
Example 6.1. Specification of SHADOW-references without corner elements
REAL A(100), B(100)
CDVM$ ALIGN B( I ) WITH A( I )
CDVM$ DISTRIBUTE ( BLOCK) :: A
CDVM$ SHADOW B( 1:2 )
. . .
CDVM$ PARALLEL ( I ) ON A ( I ), SHADOW_RENEW ( B )
DO 10 I = 2, 98
A(I) = (B(I-1) + B(I+1) + B(I+2) ) / 3
10 CONTINUE
When renewing shadow edges the maximal widths 1:2 specified in SHADOW directive are used.
Distribution and shadow edge renewing scheme are shown on fig. 6.1.
| P-1 | P | P+1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| V | V | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| | | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Fig. 6.1. Distribution of array with shadow edges.
Two buffers, that are continuous prolongation of the array local section, are allocated on each processor. The width of low shadow edge is equal to 1 element (for B(I-1)), the width of high shadow edge is equal to 2 elements ( for B(I+1) and B(I+2)). If before loop entering to perform processor exchange according to scheme on fig. 6.1, the loop can be executed on each processor without replacing the references to the arrays by the references to the buffer.
Shadow edges for multidimensional distributed arrays can be specified for each dimension. A special case is when it is required to renew "a corner" of shadow edges. In such a case additional CORNER parameter is needed.
Example 6.2. Specification of SHADOW-references with corner elements
REAL A(100,100), B(100,100)
CDVM$ ALIGN B( I, J ) WITH A( I, J )
CDVM$ DISTRIBUTE A ( BLOCK,BLOCK)
. . .
CDVM$ PARALLEL ( I, J ) ON A ( I, J ), SHADOW_RENEW ( B (CORNER))
DO 10 I = 2, 99
DO 10 J = 2, 99
A(I,J) = (B(I,J+1) + B(I+1,J) + B(I+1,J+1) ) / 3
10 CONTINUE
The width of shadow edges of the array B is equal to 1 for both dimensions by default. As "corner" reference B(I+1,J+1) exists, the CORNER parameter is specified.
|
| shadow edges | |||||||||
|
| sent values | |||||||||
|
| internal area | |||||||||
|
| corner elements |
Fig. 6.2. Scheme of array local section with shadow edges.
6.2.3Specification of ACROSS-dependent references of SHADOW type for single loop
Consider the following loop
DO 10 I = 2, N-1
DO 10 J = 2, N-1
A(I,J) = (A(I,J-1) + A(I,J+1) + A(I-1,J) + A(I+1,J)) / 4
10 CONTINUE
Data dependence exists between loop index i1 and i2 ( i1<i2 ), if both these iterations refer to the same array element by write-read or read-write scheme.
If iteration i1 writes a value and iteration i2 reads this value, then flow dependence, or simply dependence exists between the iterations.
If iteration i1 reads the "old" value and iteration i2 writes the "new" value, then anti-dependence i1 i2 exists between the iterations.
In both cases iteration i2 can be executed only after iteration i1.
The value i2 - i1 is called a range or length of dependence. If for any iteration i dependent iteration i + d (d is constant) exists, then the dependence is called regular one, or dependence with constant length.
The loop with regular computations, with regular dependencies on distributed arrays, can be distributed with PARALLEL directive, using ACROSS clause.
| across-clause | is ACROSS ( dependent-array-list ) |
| dependent-array | is dist-array-name ( dependence-list ) [ (CORNER) ] |
| dependence | is flow-dep-length : anti-dep-length |
| flow-dep-length | is int-constant |
| anti-dep-length | is int-constant |
All the distributed arrays with regular data dependence are specified in ACROSS clause. The length of flow dependence (flow-dep-length) and the length of anti-dependence (anti-dep-length) are specified for each dimension of the array. There is no data dependence, if length is equal to zero.
Constraint.
-
If a processor arrangement rank more then 1, for each reference to the array only flow-dependence or only anti-dependence can exist. For example, A(I-1,J-1), A(I+1,J+1) references are allowed, but A(I‑1,J+1), A(I+1,J-1) references are forbidden.
Example 6.3. Specification of the loop with regular data dependence.
CDVM$ PARALLEL ( I, J ) ON A( I, J ) , ACROSS ( A( 1:1, 1:1 ))
DO 10 I = 2, N-1
DO 10 J = 2, N-1















