fdvmLDe (1158336), страница 6
Текст из файла (страница 6)
rv = er
L( 1 ) = e1
. .
L( n ) = en
endif
if( er ol rv ) then
rv = er
L( 1 ) = e1
. .
L( n ) = en
endif
The correspondence between statement form, Fortran operation and FDVM reduction name is given below:
Statement form | Fortran operation | FDVM reduction name |
1 | + | SUM(rv) |
1 | * | PRODUCT(rv) |
1 | .AND. | AND(rv) |
1 | .OR. | OR(rv) |
1 | .EQV. | EQV(rv) |
1 | .NEQV. | NEQV(rv) |
2,3 | MAX(rv) | |
MIN(rv) | ||
4 | MINLOC(rv,L,n) | |
MAXLOC(rv,L,n) |
MAXLOC (MINLOC) operation assumes the calculation of maximal (minimal) value and defining its coordinates.
Example 5.4. Specification of reduction.
S = 0
X = 1.E10
Y = -1.
IMIN(1) = 0
CDVM$ PARALLEL ( I ) ON A( I ) ,
CDVM$*REDUCTION (SUM(S), MAX(X), MINLOC(Y,IMIN(1),1))
DO 10 I = 1, N
S = S + A(I)
X = MAX(X, A(I))
IF(A(I) .LT. Y) THEN
Y = A(I)
IMIN(1) = I
ENDIF
10 CONTINUE
5.2Computations outside parallel loop
The computations outside parallel loop are performed according to own computation rule. Let statement
IF p THEN lh = rh
be outside parallel loop.
Here p – logical expression,
lh – left side of the assignment statement (a pointer to scalar or array element),
rh – right side of the assignment statement (expression).
Then the statement will be executed on the processor, where data with lh reference (own( lh ) processor) are allocated. All data in p and rh expressions must be allocated on the own( lh ) processor. If any data in p and rh expressions are not allocated on the own(lh) processor, they must be specified prior the statement in remote access directive (see section 6.1.2).
If lh is a reference to distributed array and data dependence between rh and lh exists, the it is necessary to replicate the distributed array by the directive
REDISTRIBUTE A( *,...,* ) or REALIGN A( *,...,* )
before the statement execution.
Example 5.8. Own computations.
PARAMETER (N = 100)
REAL A(N,N+1), X(N)
CDVM$ ALIGN X( I ) WITH A(I,N+1)
CDVM$ DISTRIBUTE (BLOCK,*) :: A
. . .
C back substitution of Gauss algorithm
C own computations outside the loops
C
C own computation statement
C left and right sides are on the same processor
X(N) = A(N,N+1) / A(N,N)
DO 10 J = N-1, 1, -1
CDVM$ PARALLEL ( I ) ON A (I,*)
DO 20 I = 1, J
A(I,N+1) = A(I,N+1) - A(I,J+1) * X(J+1)
20 CONTINUE
C own computations in sequential loop,
C nesting the parallel loop
X(J) = A(J,N+1) / A(J,J)
10 CONTINUE
Note, that A(J,N+1) and A(J,J) are localized on the processor, where X(J) is allocated.
6Remote data specification
6.1Remote references definition
Data, allocated on one processor, and used on other one are called remote data. Actually, these data are common (shared) data for these processors. The references to such data are called remote references. Consider generalized statement:
IF (…A(inda)…) B(indb) = …C(indc)…
where
A, B, C - distributed arrays,
inda, indb, indc - index expressions.
In DVM model this statement will be executed on own(B(indb)) processor, that is on the processor, where B(indb) element is allocated. A(inda) and C(indc) references are not remote references, if corresponding elements of A and C arrays are allocated on own(B(indb)) processor. This is guarantied only if A(inda), B(indb) and C(indc) are aligned in the same point of alignment template. If the alignment is impossible or unrealizable, the references A(inda) and/or C(indc) should be specified as remote references. For multi-dimensional arrays this rule is applied to every distributed dimension.
By degree of processing efficiency remote references are subdivided on two types: SHADOW and REMOTE.
If B and C arrays are aligned and
inda = indc d ( d – positive integer constant),
then the remote reference C(indc) belongs to SHADOW type. Remote reference to multi-dimensional array belongs to SHADOW type, if distributed dimensions satisfy to SHADOW type definition.
Remote references, that don't belong to SHADOW type, are REMOTE type references.
Special set of remote references is set of references to reduction variables (see 5.2.4), that belongs to REDUCTION type. These references can be used in parallel loop only.
There are two kinds of specifications: synchronous and asynchronous for all types of remote references.
Synchronous specification defines group processing of all remote references for given statement or loop. During this processing, requiring communications, execution of the statement or the loop is suspended. Asynchronous specification allows overlapping computations and communications. It unites remote references of several statements and loops. To start reference processing operation and wait for its completion, special directives are used. Between these directives other computations, that don't contain references to specified variables, can be performed.
6.2SHADOW type references
6.2.1Specification of array with shadow edges
Remote reference of SHADOW type means, that remote data processing will be performed, using "shadow" edges. Shadow edge is a buffer, that is continuous prolongation of the array local section in the processor memory (see fig. 6.1). Consider following statement
A( i ) = B( i + d2) + B( i – d1)
where d1, d2 - integer positive constants. If both references to B array are remote references of SHADOW type, B array should be specified in SHADOW directive as B( d1 : d2 ), where d1 is low edge width, and d2 is high edge width. For multidimensional arrays the edges by each dimension should be specified. Maximal width for all remote references of SHADOW type is set in shadow edges specification.
SHADOW directive syntax.
shadow-directive | is SHADOW dist-array ( shadow-edge-list ) |
or SHADOW ( shadow-edge-list ) :: dist-array-list |
dist-array | is array-name |
or pointer-name |
shadow-directive | is SHADOW shadow-array-list |
shadow-array | is array-name ( shadow-edge-list ) |
shadow-edge | is width |
or low-width : high-width |
width | is int-expr |
low-width | is int-expr |
high-width | is int-expr |
Constraint:
-
The width of low shadow edge (low-width) and width of high shadow edge (high-width) must be integer non-negative constant expressions.
A specification of shadow edge width as width is equivalent to width : width specification.
The width of the both shadow edges of a distributed array is equal to 1 for each distributed dimension by default.
6.2.2Synchronous specification of independent references of SHADOW type for single loop
Synchronous specification is a clause in PARALLEL directive.
shadow-renew-clause | is SHADOW_RENEW ( renewee‑list ) |
renewee | is dist-array-name [ ( shadow-edge-list )] [ (CORNER) ] |
Constraints:
-
Width of the shadow edges filled by values must not exceed the maximal width specified initially in the SHADOW directive.
-
If shadow edge widths is not specified, then the maximal widths are used.
Synchronous specification performing is renewing shadow edges by the values of remote variables before loop execution.
Example 6.1. Specification of SHADOW-references without corner elements
REAL A(100), B(100)
CDVM$ ALIGN B( I ) WITH A( I )
CDVM$ DISTRIBUTE (BLOCK) :: A
CDVM$ SHADOW B( 1:2 )
. . .
CDVM$ PARALLEL ( I ) ON A ( I ), SHADOW_RENEW ( B )
DO 10 I = 2, 98
A(I) = (B(I-1) + B(I+1) + B(I+2) ) / 3
10 CONTINUE
When renewing shadow edges, the maximal widths 1:2 specified in SHADOW directive are used.
Distribution and shadow edge renewing scheme are shown on fig. 6.1.
P-1 | P | P+1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
V | V | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Fig. 6.1. Distribution of array with shadow edges.
Two buffers, that are continuous prolongation of the array local section, are allocated on each processor. The width of low shadow edge is equal to 1 element (for B(I‑1)), the width of high shadow edge is equal to 2 elements ( for B(I+1) and B(I+2)). If before loop entering to perform processor exchange according to scheme on fig. 6.1, the loop can be executed on each processor without replacing the references to the arrays by the references to the buffer.