fdvmLDe (1158336), страница 8
Текст из файла (страница 8)
.
Fig. 6.4. Pipeline execution
j
i
| |||||||
| |||||||
| |||||||
| |||||||
| |||||||
t2 | |||||||
| |||||||
| |||||||
| |||||||
| |||||||
| |||||||
Fig. 6.5. Parallelization by hyper-planes
of virtual processor arrangement.
6.2.5Asynchronous specification of independent references of SHADOW type
Updating values of shadow edges is indivisible (synchronous) exchange operation for unnamed group of distributed arrays. The operation can be divided into two operations:
-
starting exchange;
-
waiting for values.
While waiting for shadow edge values, other computations can be performed, in particular, computations on internal area of the local array section can be done.
The following directives describe asynchronous renewing of shadow edges for named group of distributed arrays.
Creation of a group.
shadow-group-directive | is SHADOW_GROUP shadow-group-name |
Start of shadow edges renewing.
shadow-start-directive | is SHADOW_START shadow-group-name |
Waiting for shadow edges values.
shadow-wait-directive | is SHADOW_WAIT shadow-group-name |
SHADOW_START directive must be executed after SHADOW_GROUP one. After SHADOW_GROUP directive execution SHADOW_START and SHADOW_WAIT directives can be executed many times. Updated values of the shadow edges may be used only after SHADOW_WAIT directive execution.
A special case is using SHADOW_START and SHADOW_WAIT directives as shadow-renew-clause of parallel loop.
shadow-renew-clause | is . . . |
or shadow-start-directive | |
or shadow-wait-directive |
If the specification contains SHADOW_START, the surpassing computation of the values sent to the shadow edges is performed on each processor. Then the shadow edges are renewed and computation on internal area of the array local section is done (see fig. 6.2).
If the specification contains SHADOW_WAIT, the surpassing computation of the values in internal area of the local array section is performed. After completion of waiting for new values of shadow edges the calculations, using the values are performed.
Example 6.8. Overlapping computations and shadow edges updating.
REAL A(100,100), B(100,100), C(100,100), D(100,100)
CDVM$ ALIGN (I,J) WITH C(I,J) :: A, B, D
CDVM$ DISTRIBUTE (BLOCK,BLOCK) :: C
. . .
CDVM$ SHADOW_GROUP AB ( A, B )
. . .
CDVM$ SHADOW_START AB
. . .
CDVM$ PARALLEL (I,J) ON C(I,J), SHADOW_WAIT AB
DO 10 I = 2, 99
DO 10 J = 2, 99
C(I,J) = (A(I-1,J) + A(I+1,J) + A(I,J-1) + A(I,J+1)) / 4
D(I,J) = (B(I-1,J) + B(I+1,J) + B(I,J-1) + B(I,J+1)) / 4
10 CONTINUE
The shadow edge width of distributed arrays is equal to 1 element for each dimension. Since SHADOW_WAIT directive is specified in parallel loop directive, the order of execution of the loop iterations is changed. At first computations on internal area of each local array section are performed. Then directive of waiting for updated values of shadow edges is performed. The loop execution is completed by computation of the values sent to shadow edges.
6.3REMOTE type references
6.3.1REMOTE_ACCESS directive
Remote references of REMOTE type are specified by REMOTE_ACCESS directive.
remote-access-directive | is REMOTE_ACCESS |
regular-reference | is dist-array-name [( regular-subscript-list )] |
regular-subscript | is int-expr |
or do-variable-use | |
or : | |
remote-access-clause | is remote-access-directive |
REMOTE_ACCESS directive can appear as a separate directive (its operating area is a following statement) or as a clause in PARALLEL directive (its operating area is parallel loop body).
If remote reference is specified as an array name without index list, all references to the array in a parallel loop (statement) are remote references of REMOTE type.
Let us consider remote reference to multi-dimensional array A( ind1, ind2,…,indk ). Let indj be index expression by j-th dimension.
Index expression is specified without changes in REMOTE_ACCESS directive, if
-
j-th dimension is distributed dimension,
-
indj = a * i + b, where a and b are not updated during loop execution (invariants).
In all other cases symbol “:” (total dimension) is specified instead of indj in REMOTE_ACCESS directive.
6.3.2Synchronous specification of REMOTE type references
If in REMOTE_ACCESS directive a group name (remote-group-name) is not specified the directive is executed in synchronous mode. In boundaries of a statement or a parallel loop below the compiler replaces all remote references by references to a buffer. The values of remote data are passed prior the statement or the loop execution.
Example 6.9. Synchronous specification of REMOTE type references.
DIMENSION A(100,100), B(100,100)
CDVM$ DISTRIBUTE (*,BLOCK) :: A
CDVM$ ALIGN B(I,J) WITH A(I,J)
. . .
CDVM$ REMOTE_ACCESS ( A(50,50) )
C replacing reference A(50,50) by reference to buffer
C sending value A(50,50) to all the processors
1 X = A(50,50)
. . .
CDVM$ REMOTE_ACCESS ( B(100,100) )
C sending value B(100,100) to the buffer of processor own(A(1,1)
2 A(1,1) = B(100,100)
. . .
CDVM$ PARALLEL (I,J) ON A(I,J) , REMOTE_ACCESS ( B(:,N) )
C sending values B(:,N) to processors own(A(:,J))
3 DO 10 I = 1, 100
DO 10 J = 1, 100
10 A(I,J) = B(I,J) + B(I,N)
First two REMOTE_ACCESS directives specify remote references for separate statements. REMOTE_ACCESS directive in parallel loop specifies remote data (matrix column) for all processors, array A is mapped on.
6.3.3Asynchronous specification of REMOTE type references
If in REMOTE_ACCESS directive a group name (remote-group-name) is specified the directive is executed in asynchronous mode. To specify this mode following additional directives are required.
Group name definition.
remote-group-directive | is REMOTE_GROUP remote-group-name-list |
The identifier, defined in the directive, can be used only in REMOTE_ACCESS, PREFETCH and RESET directives. The group remote-group is global object, its scope is the whole program.
prefetch-directive | is PREFETCH remote-group-name |
reset-directive | is RESET remote-group-name |
Consider the following typical sequence of asynchronous specification of REMOTE type references:
CDVM$ REMOTE_GROUP RS
10 . . .
CDVM$ PREFETCH RS
. . .
C calculations, where remote references r1,…,rn don't take part
. . .
CDVM$ PARALLEL . . . , REMOTE_ACCESS (RS : r1)
. . .
CDVM$ REMOTE_ACCESS (RS : ri)
. . .
CDVM$ PARALLEL . . . , REMOTE_ACCESS (RS : rn)
. . .
IF( P ) GO TO 10
When given sequence of statements is executed first time, PREFETCH directive is not executed. REMOTE_ACCESS directives are executed in usual synchronous mode. At that the references are accumulated in variable RS. After execution of all the sequence of REMOTE_ACCESS directives the value of variable RS is union of subgroups of remote references ri ... rn.
When the sequence is executed the second and next time, PREFETCH directive performs surpassed sending of remote data for all the references, contained in variable RS. After PREFETCH directive and up to first REMOTE_ACCESS directive with the same group name other computations overlapping waiting for remote reference processing, can be performed. At that REMOTE_ACCESS directive doesn't cause any data sending.
Constraints:
-
Repeated performance of PREFETCH directive is correct, only if the remote reference group characteristics (the loop parameters, array distributions and index expression values in remote references) are not updated;
-
PREFETCH directive can be performed for the several loops (several REMOTE_ACCESS directives), if there are no data dependencies for distributed arrays specified in REMOTE_ACCESS directives.
If remote reference group characteristics were changed it is necessary to assign to the remote reference group undefined value using RESET directive. Then new accumulation of remote reference group will be done.
Consider the following fragment of multi-block problem. Simulation area is split on 3 blocks as is shown in fig. 6.6.
M | |||||
N1 | A1 | ||||
D | |||||
N2 | A2 | A3 | |||
M1 | M2 | ||||
Fig. 6.6. Splitting simulation area.