fdvmLDe (1158336), страница 2
Текст из файла (страница 2)
Then the arrays to be distributed over processors (distributed data) are determined. These arrays are specified by data mapping directives (section 4). The other variables (distributed by defaults) are mapped by one copy per each processor (replicated data). A value of replicated variable must be the same one on all the processors concerned. Single exception is the variables in parallel constructions (see sections 5.1.3, 5.1.4 and 7.5).
FDVM model defines two parallelism levels:
-
data parallelism on processor arrangement section;
-
task parallelism: independent computations on sections of processor arrangement
Data parallelism is implemented by distribution of tightly nested loops over the processors (section 5). Each iteration of such loop is executed on one processor entirely. The statements located outside parallel loop are executed according own computation rules (section 5.2).
Task parallelism is implemented by distribution of data and independent computations over sections of processor arrangement (section 7).
When calculating the value of own variable, the processor may need in values of as own as other (remote) variables. All remote variables must be specified in remote data access directives (section 6).
2.2Syntax of FDVM directives
The syntax of FDVM directives is described using a Backus-Naur form and the following notations:
is is by definition
or an alternative construct
[ ] encloses optional construct
[ ]… encloses an optionally repeated construct which may occur zero or more times
x-list x [ , x ]…
Syntax of the directive.
directive-line | is CDVM$ dvm-directive |
or *DVM$ dvm-directive | |
dvm-directive | is specification-directive |
or executable-directive | |
specification-directive | is processors-directive |
or align-directive | |
or distribute-directive | |
or template-directive | |
or pointer-directive | |
or shadow-directive | |
or dynamic-directive | |
or inherit-directive | |
or remote-group-directive | |
or reduction-group-directive | |
or task-directive | |
or heap-directive | |
or asyncid-directive | |
executable-directive | is realign-directive |
or redistribute-directive | |
or parallel-directive | |
or remote-access-directive | |
or shadow-group-directive | |
or shadow-start-directive | |
or shadow-wait-directive | |
or reduction-start-directive | |
or reduction-wait-directive | |
or new-value-directive | |
or prefetch-directive | |
or reset-directive | |
or parallel-task-loop-directive | |
or map-directive | |
or task-region-directive | |
or end-task-region-directive | |
or on-directive | |
or end-on-directive | |
or f90-directive | |
or asynchronous-directive | |
or end-asynchronous-directive | |
or asyncwait-directive |
Constraints:
-
A directive-line follows the rules of fixed form comment lines.
-
A specification-directive may appear only where a specification statement may appear.
-
An executable-directive may appear only where executable statement may appear.
-
Any expression, included in specification directive, must be the specification expression (see Annex 1, s.2.4).
No statements may be interspersed within a continued directive. A directive line must not appear within a continued statement. An example of a directive continuation follows. Note that column 6 must be blank, except when signifying continuation.
CDVM$ ALIGN SPACE1(I,J,K)
CDVM$* WITH SPACE(J,K,I)
3Virtual processor arrangements. PROCESSORS directive
The PROCESSORS directive declares one or more rectangular virtual processor arrangements.
Syntax.
processors-directive | is PROCESSORS processors-decl-list |
processors-decl | is processors-name ( explicit-shape-spec-list ) |
explicit-shape-spec | is [ lower-bound : ] upper-bound |
lower-bound | is int-expr |
upper-bound | is int-expr |
The intrinsic function NUMBER_OF_PROCESSORS( ) can be used to determine the number of real processors, provided to a program.
It is possible to use several virtual processor arrangements of different shape if the number of processors in every arrangement is equal to the value of function NUMBER_OF_PROCESSORS( ). If two virtual processor arrangements have the same shape, then corresponding elements of the arrangements are referred to the same virtual processor.
Example 3.1. Declaration of virtual processor arrangements.
CDVM$ PROCESSORS P( N )
CDVM$ PROCESSORS Q( NUMBER_OF_PROCESSORS( ) ),
CDVM$* R(2, NUMBER_OF_PROCESSORS( )/2)
The value N has to be equal to the value of the function NUMBER_OF_PROCESSORS ( ).
The processor arrangements are local objects of the procedure. Data arrays with COMMON and SAVE attributes can be mapped on the local processor arrangements, if whenever the procedure is called, the local processor arrangement has the same shape.
4Data mapping
FDVM supports distribution by blocks (equal and non-equal), inherited distribution, dynamic array distribution and distribution via alignment.
4.1DISTRIBUTE and REDISTRIBUTE directives
Syntax.
distribute-directive | is dist-action distributee dist-directive-stuff | |
or dist-action [ dist-directive-stuff ] :: distributee-list | ||
dist-action | is DISTRIBUTE | |
or REDISTRIBUTE |
dist-directive-stuff | is dist-format-list [ dist-onto-clause ] |
distributee | is array-name |
dist-format | is BLOCK | |
or GEN_BLOCK ( block-size-array ) | ||
or WGT_BLOCK ( block-weight-array , nblock ) | ||
or * | ||
dist-onto-clause | is ONTO dist-target | |
dist-target | is processors-name [( processors-section-subscript-list )] | |
processors-section-subscript | is [ subscript ] : [ subscript ] | |
subscript | is int-expr | |
nblock | is int-expr | |
block-size-array | is array-name | |
block-weight-array | is array-name |
Constraints:
-
A length of dist-format-list must be equal to the rank of each distributee to which it applies. That is, distribution format must be specified for every array dimension.
-
The number of distributed dimensions of the array (format is not specified as *) has to be equal to the number of dimensions of dist-target.
-
The array mentioned as a block-array-name in GEN_BLOCK specification must be one-dimensional integer array, with size equal to the size of corresponding dimension of processor arrangement, and a sum of its element values is equal to the size of distributed dimension.
-
The array mentioned as a block-weight-array in WGT_BLOCK specification must be one-dimensional array of type DOUBLE PRECISION.
-
Either GEN_BLOCK format or WGT_BLOCK format may appear in dist-format-list but not both of them.
-
REDISTRIBUTE directive can be applied to arrays with DYNAMIC attribute only.
-
dist-directive-stuff can be omitted in DISTRIBUTE directive only. In that case distributed array can be used after REDISTRIBUTE directive execution only.
The ONTO clause specifies the virtual processor arrangement or its section. If ONTO clause is omitted, than array distribution is performed by base virtual processor arrangement, that is a parameter of program startup. When REDISTRIBUTE directive without ONTO clause is executed in ON-block, the array is distributed on the section of processor arrangement of this ON-block (see section 7).
Several arrays (A1, A2,…) can be distributed at the same mode by the single directive of the form:
CDVM$ DISTRIBUTE dist-directive-stuff :: A1, A2, …
In that case the arrays must have the same rank, but can have different sizes of dimensions.
Let us consider distribution formats for one dimension of the array (one-dimensional array A(N)) and for one dimension of the processor arrangement (one-dimensional array R(P)). Multi-dimensional distributions are considered in section 4.1.5.
4.1.1BLOCK format
A block of (N-1)/P +1 elements are allocated on each processor. It is possible for some ratio between N and P that several last processors do not contain any the array elements.
Example 4.1. Distribution by BLOCK format.
A | B | C | ||||
R(1) | 1 | 1 | 1 | |||
2 | 2 | 2 | ||||
CDVM$ PROCESSORS R( 4 ) | 3 | 3 | 3 | |||
4 | ||||||
REAL A (12), B(13), C(11) | R(2) | 4 | 5 | 4 | ||
5 | 6 | 5 | ||||
6 | 7 | 6 | ||||
CDVM$ DISTRIBUTE A (BLOCK) ONTO R | 8 | |||||
R(3) | 7 | 9 | 7 | |||
CDVM$ DISTRIBUTE (BLOCK) ONTO R :: B | 8 | 10 | 8 | |||
9 | 11 | 9 | ||||
12 | ||||||
CDVM$ DISTRIBUTE C (BLOCK) | ||||||
R(4) | 10 | 13 | 10 | |||
11 | 11 | |||||
12 | ||||||
4.1.2 GEN_BLOCK format
Distribution by blocks of different sizes allows affecting on processor loading balance for algorithms performing different volume of computations for different parts of arrays.