fdvmLDe (1158336), страница 9
Текст из файла (страница 9)
Example 6.10. Using named group of regular remote references.
REAL A1(M,N1+1), A2(M1+1,N2+1), A3(M2+1,N2+1)
CDVM$ DISTRIBUTE (BLOCK,BLOCK) :: A1, A2, A3
CDVM$ REMOTE_GROUP RS
DO 1 ITER = 1, MIT
. . .
C edge exchange along division line D
CDVM$ PREFETCH RS
. . .
CDVM$ PARALLEL ( I ) ON A1(I,N1+1), REMOTE_ACCESS (RS: A2(I,2))
DO 10 I = 1, M1
10 A1(I,N1+1) = A2(I,2)
CDVM$ PARALLEL ( I ) ON A1(I,N1+1), REMOTE_ACCESS (RS: A3(I-M1,2))
DO 20 I = M1+1, M
20 A1(I,N1+1) = A3(I-M1,2)
CDVM$ PARALLEL ( I ) ON A2(I,1), REMOTE_ACCESS (RS: A1(I,N1))
DO 30 I = 1, M1
30 A2(I,1) = A1(I,N1)
CDVM$ PARALLEL ( I ) ON A3(I,1), REMOTE_ACCESS (RS: A1(I+M1,N1))
DO 40 I = 1, M2
40 A3(I,1) = A1(I+M1,N1)
. . .
IF (NOBLN) THEN
C redistribution of arrays to balance loading
. . .
CDVM$ RESET RS
END IF
. . .
1 CONTINUE
6.3.4Asynchronous copying by REMOTE type references
If parallel loop contains only assignment statement without computations, the access by REMOTE type references can be performed more effectively using asynchronous copying of distributed arrays.
6.3.4.1Loop and copy-statements
Consider following loop
DO 10 I1 = L1,H1,S1
. . .
DO 10 In = Ln,Hn,Sn
10 A(f1,…,fk) = B (g1,…,gm)
where A, B - identifiers of different distributed arrays.
Li, Hi, Si – the loop invariants
fi = ai *Ii + bi
gj = cj *Ij + dj
ai, bi , cj, dj – integer expressions, the loop invariants (the expressions, which values are not updated during the loop execution).
Every loop variable Il can be used at most in one expression fi and at most in one expression gj.
The loop can contain several statements satisfying the restrictions above. Such loop will be called copy-loop.
Copy-loop can be described by one or several copy-statements of the form
A(1,…,k) = B(1,…,m)
where
i = li : hi : si
j = lj : hj : sj
i, j are triplets of Fortran 90.
Copy-statement is similar to array section assignment statement in Fortran 90.
The rules of compact notation exist for triplets. Define these rules for triplet i for example.
-
If the whole dimension of array take part in coping, then
i = :
-
If si = 1, then
i = li : hi
-
If Li = hi, then
i = li
For copy-loop 10 triplet expressions are defined in the following way
For i | For j |
li = ai *Li + bi | lj = cj *Lj + dj |
hi = ai *Hi + bi | hj = cj *Hj + dj |
si = ai *Si | sj = cj *Sj |
Consider the following copy-loop
REAL A(N1,N2,N3), B(N1,N3)
DO 10 I1 = 1, N1
DO 10 I2 = 2, N3-1
10 A(I1, 5, I2+1) = B(I1, I2-1)
Following copy-statement corresponds to this loop
A( :, 5, 3:N3 ) = B( :, 1:N3-2 )
6.3.4.2Asynchronous coping directives
Asynchronous coping allows to overlap data passing between processors with execution of other statements.
Asynchronous coping is specified by combination of start coping directive (ASYNCHRONOUS ID) and the directive of waiting for coping completion (ASYNCWAIT ID). The correspondence of directives is defined by the same identifier ID.
6.3.4.2.1ASYNCID directive
ASYNCID directive describes individual identifier for every pair of asynchronous copying directives.
The directive syntax:
asyncid-directive | is ASYNCID async-name-list |
6.3.4.2.2F90 directive
F90 directive is prefix for every copy-statement.
Syntax.
f90-directive | is F90 copy-statement |
copy-statement | is array-section = array-section |
array-section | is array-name [( section-subscript-list )] |
section-subscript | is subscript |
or subscript-triplet | |
subscript-triplet | is [ subscript ] : [ subscript ] [ : stride] |
subscript | is int-expr |
stride | is int-expr |
6.3.4.2.3ASYNCHRONOUS and END ASYNCHRONOUS directives
ASYNCHRONOUS and END ASYNCHRONOUS directives specify block construction.
Syntax.
asynchronous-construct | is asynchronous-directive |
f90-directive [ f90-directive ] … copy-loop [ copy-loop ] … | |
end-asynchronous-directive | |
asynchronous-directive | is ASYNCHRONOUS async-name |
end-asynchronous-directive | is END ASYNCHRONOUS |
All assignment statements in copy-loops should be described by F90 directives with corresponding copy-statement.
6.3.4.2.4ASYNCWAIT directive
Syntax.
asyncwait-directive | is ASYNCWAIT async-name |
The example from section 6.3.4.1 can be specified as asynchronous coping in the following way.
CDVM$ ASYNCID TR
REAL A(N1,N2,N3), B(N1,N3)
. . .
CDVM$ ASYNCHRONOUS TR
CDVM$ F90 A( :, 5, 3:N3 ) = B( :, 1:N3-2 )
DO 10 I1 = 1, N1
DO 10 I2 = 2, N3-1
10 A(I1,5,I2+1) = B(I1,I2-1)
CDVM$ END ASYNCHRONOUS
. . .
sequence of statements,
that are performed against a background of data passing
. . .
CDVM$ ASYNCWAIT TR
6.4REDUCTION type references
6.4.1Synchronous specification of REDUCTION type references
If there is no group name in REDUCTION specification of parallel loop, it is synchronous specification and executed in the following way.
-
Local reduction calculation. During the loop execution on each processor local value of reduction is calculated for the part of data, allocated at the processor.
-
Global reduction calculation. After the loop completion inter-processor reduction of local values is automatically calculated. Resulted value is assigned to the reduction variable on each processor.
6.4.2Asynchronous specification of REDUCTION type references
Asynchronous specification allows:
-
to joint in one group the reduction variables, calculated in different loops;
-
overlap global group reduction execution with other computations.
For asynchronous specification besides REDUCTION directive (with the group name) the following additional directives are required.
reduction-group-directive | is REDUCTION_GROUP reduction-group-name-list |
reduction-start-directive | is REDUCTION_START reduction-group-name |
reduction-wait-directive | is REDUCTION_WAIT reduction-group-name |
Typical sequence of asynchronous specifications of REDUCTION type is the following.
CDVM$ REDUCTION_GROUP RD
. . .
CDVM$ PARALLEL . . . , REDUCTION (RD : d1)
C local reduction d1
. . .
CDVM$ PARALLEL . . . , REDUCTION (RD : dn)
C local reduction dn
. . .
CDVM$ REDUCTION_START RD
C beginning of global reduction di ... dn
. . .
CDVM$ REDUCTION_WAIT RD
C end of global reduction di ... dn
Constraints:
-
Before executing REDUCTION_START directive, the reduction variables of the group may be used in reduction statements of parallel loops only.
-
REDUCTION_START and REDUCTION_WAIT directives must be executed after the completion of the loop (loops) where the local values of the reduction variables were calculated. The only statements allowed between these directives are those that don't use the reduction variable values.
-
REDUCTION_WAIT directive deletes the reduction group.
Example 6.11. Asynchronous specification of REDUCTION type references.
CDVM$ DISTRIBUTE A (BLOCK)
CDVM$ ALIGN B( I ) WITH A( I )
CDVM$ REDUCTION_GROUP RD
. . .
S = 0.
CDVM$ PARALLEL ( I ) ON A( I ),
CDVM$* REDUCTION (RD : SUM(S))
DO 10 I = 1, N
10 S = S + A(I)
X = 0.
CDVM$ PARALLEL ( I ) ON B( I ),
CDVM$* REDUCTION (RD : MAX(X))
DO 20 I = 1, N
20 X = MAX(X, ABS(B(I)))
CDVM$ REDUCTION_START RD
C beginning of global reduction SUM(S) and MAX(X)
CDVM$ PARALLEL ( I ) ON A( I )
DO 30 I = 1, N
30 A(I) = A(I) + B(I)
CDVM$ REDUCTION_WAIT RD