FDVM2 (1158347), страница 5
Текст из файла (страница 5)
DO 10 I = 1, N
DO 10 J = 1, N
X = B(I,J) + C(I,J)
A(I,J) = X
10 CONTINUE
5.2.4. Reduction Operations and Variables. REDUCTION Clause
Programs often contain loops with so called reduction operations: array elements are accumulated in some variable, minimum or maximum value of them is determined. Iterations of such loop may be distributed also, if to use the REDUCTION clause.
| reduction-clause | is REDUCTION ( reduction-op-list ) |
| reduction-op | is reduction-op-name ( reduction-variable ) |
| or reduction-loc-name ( reduction-variable , int-variable-list) |
| reduction-variable | is array-name |
| or scalar-variable-name |
| reduction-op-name | is SUM |
| or PRODUCT | |
| or MAX | |
| or MIN | |
| or AND | |
| or OR | |
| or EQV | |
| or NEQV |
| reduction-loc-name | is MAXLOC |
| or MINLOC |
Distributed arrays may not be used as reduction variables. Reduction variables are calculated and used only inside the loop in statements of a certain type: the reduction statements.
Let us introduce some notation.
| rv | - is a reduction variable |
| er | - is an expression that does not contains the reduction variable; |
| Ik | - integer variable |
| op | - one of the following Fortran operations: +, -, .OR., .AND., .EQV., .NEQV. |
| ol | - one of the following Fortran operations:: .GE., .GT., .LE., .LT. |
| f | - max or min function |
Reduction statement in the loop body is the statement of one of the following forms:
1) rv = rv op er
rv = er op rv
2) rv = f( rv, er )
rv = f( er, rv )
3) if( rv ol er ) rv = er
if( er ol rv ) rv = er
4) if( rv ol er ) then
rv = er
I1 = e1
. . .
In = en
endif
if( er ol rv ) then
rv = er
I1 = e1
. . .
In = en
endif
The correspondence between statement form, Fortran operation and FDVM reduction name is given below:
| Statement form | Fortran operation | FDVM reduction name |
| 1 | + | SUM(rv) |
| 1 | * | PRODUCT(rv) |
| 1 | .AND. | AND(rv) |
| 1 | .OR. | OR(rv) |
| 1 | .EQV. | EQV(rv) |
| 1 | .NEQV. | NEQV(rv) |
| 2,3 | MAX(rv) | |
| MIN(rv) | ||
| 4 | MINLOC(rv, I1,...,In) | |
| MAXLOC(rv, I1,...,In) |
Example 5.4. Specification of reduction.
S = 0
X = A(1)
Y = A(1)
MINI = 1
CDVM$ PARALLEL ( I ) ON A( I ) ,
CDVM$ REDUCTION(SUM(S), MAX(X), MINLOC(Y,MINI))
DO 10 I = 1, N
S = S + A(I)
X = MAX(X, A(I))
IF(A(I) .LT. Y) THEN
Y = A(I)
MINI = I
ENDIF
10 CONTINUE
5.2.5 Distribution of the Loop with Regular Data Dependence. ACROSS Clause
Consider the following loop
DO 10 I = 2, N-1
DO 10 J = 2, N-1
A(I,J) = (A(I,J-1) + A(I,J+1) + A(I-1,J) + A(I+1,J)) / 4
10 CONTINUE
Data dependence exists between loop index i1 and i2 ( i1<i2 ), if both these iterations refer to the same array element by write-read or read-write scheme.
If iteration i1 writes a value and iteration i2 reads this value, then flow dependence, or simply dependence exists between the iterations.
If iteration i1 reads the "old" value and iteration i2 writes the "new" value, then antidependence i1 i2 exists between the iterations.
In both cases iteration i2 can be executed only after iteration i1.
The value i2 - i1 is called a range or length of dependence. If for any iteration i dependent iteration i + d (d is constant) exists, then the dependence is called regular one, or dependence with constant length.
The loop with regular computations, with regular dependencies on distributed arrays, can be distributed with directive PARALLEL, using specification ACROSS.
| across-clause | is ACROSS ( dependent-array-list ) |
| dependent-array | is dist-array-name ( dependence-list ) |
| dependence | is flow-dep-length : anti-dep-length |
| flow-dep-length | is int-constant |
| anti-dep-length | is int-constant |
All the distributed arrays with regular data dependence are specified in specification ACROSS. The length of flow dependence (flow-dep_length) and the length of antidependence (anti-dep-length) are specified for each dimension of the array. There is no data dependence, if length is equal to zero.
Example 5.5. Distribution of the loop with regular data dependence.
CDVM$ PARALLEL ( I, J ) ON A( I, J ) , ACROSS ( A( 1:1, 1:1 ))
DO 10 I = 2, N-1
DO 10 J = 2, N-1
A(I,J) = (A(I,J-1) + A(I,J+1) + A(I-1,J) + A(I+1,J)) / 4
10 CONTINUE
Flow and anti- dependencies of length 1 exist for all dimensions of the array A.
5.3. Distribution of Loop Iterations with Irregular Computations
5.3.1. Model of Irregular Computations
Irregular computations dominate in such applications, as transformation of sparse matrixes, calculations on irregular (non-structured) grids, simulation of particle interaction (problem of N bodies) and so on.
Represent model of irregular computation as a graph G(V,E), where V is a set of graph nodes, and E is a set of edges. Graph nodes corresponds to data and graph edges corresponds to computations. The edge Ei, connecting nodes Vi1 and Vi2, means, that when computing data in one node, data from the second node are used.
Introduce a numeration of graph nodes and represent nodes as one-dimensional array V(NV), where NV is a number of graph nodes. Two models of realization of irregular computations are possible: graph edges covering and graph nodes covering. In data parallelism model graph nodes covering is most optimal algorithm [4].
To implement this model represent edges as index matrix ME(NV, MAXE+1), where MAXE is the maximal number of edges, originated from one graph node. The element ME(i,1) contains the number of edges, originated from i-th node. The elements M(i,j) for 1<j ME(i,1) contains node numbers of the edges, originated from i-th node.
The following program fragment represents a loop of graph node covering.
Example 5.6. Graph node covering.
REAL V1(NV), V2(NV)
INTEGER ME(NV,MAXE+1)
DO 10 I = 1, NV
DO 10 J = 2, ME(I,1)
V1(I) = V1(I) + V2(ME(I,J))
10 CONTINUE
5.3.2. Distributed Loop with Irregular Computations
Distributed loop with irregular computations must satisfy to the following requirements:
-
directive PARALLEL is applied only to one loop header (one-dimensional loop);
-
there is no data dependence except reduction dependence;
-
distributed arrays, used in the loop, have only one distributed dimension;
-
a loop iteration with index I indexes only distributed dimensions by the following expressions:
a*I + b, where a and b is not calculated in the loop (invariants)
ME(I,J), where ME is index matrix, and J is sequential loop index
-
n where n = ME(I,J)
-
there are no I/O statements, REALIGN and REDISTRIBUTE directives in the loop body.
To distribute the loop with irregular computations the PARALLEL directive with INDIRECT_ACCESS clause (see section 6.2) is used.
Example 5.7. Distribution of the loop with irregular computations.
REAL V1(NV), V2(NV)
INTEGER ME(NV,MAXE+1)
CHPF$ ALIGN V2( I ) WITH V1( I )
CHPF$ ALIGN ME( I, * ) WITH V1( I )
CHPF$ DISTRIBUTE V1 ( BLOCK )
. . .
CDVM$ PARALLEL ( I ) ON V1 ( I ), INDIRECT_ACCESS ( V2( ME( I, J )))
DO 10 I = 1, NV
DO 10 J = 2, ME(I,1)
K = M(I,J)
V1(I) = V1(I) + V2(ME(I,J))
V2(I) = V2(I) + V1(K)
10 CONTINUE
5.4. Non-Distributed (Replicated) Computations
Define the statement of own computations as follows
A(k1,...,kn) = e
| where A(k1,...,kn) | - distributed array element, kj is index expression, which doesn't contain variables of distributed loops. |
| e | - expression upon data, localized on the processor, where element A(k1,...,kn) is allocated |
The statement of own computations must be specified by directive OWN.
| own-directive | is OWN |
The statement will be executed only on the processor, where element A(k1,...,kn) is allocated.
The computations, except own computation statements, not satisfying to distributed loop requirements, are performed on all the processors of the current subsystem. It is necessary to distinguish the following kinds of replicated computations.
-
Sequential loop, nested into the distributed loop, may index replicated data and local dimensions of distributed arrays only.
-
Sequential loop, nesting the distributed loop, may index
inside the distributed loop - replicated data and local dimensions of distributed arrays only
outside the distributed loop - replicated data and elements of distributed arrays in own
computation statements.
-
Sequential loop outside the distributed loop can index replicated data only. If a distributed array is used in the sequential loop, it is necessary to replicate it before the loop execution by REDISTRIBUTE А( *,...,* ) or REALIGN А( *,...,* ) directive. When leaving the loop a user can to restore necessary distribution of the arrays by REDISTRIBUTE and REALIGN directives.
-
Only replicated data and elements of distributed arrays in own computation statements may be used in sequential computations between upper level loops.
Thus statements of own computations may be used in sequential computations outside the distributed loop.















