CDVM2 (1158340), страница 7
Текст из файла (страница 7)
DVM(REDUCTION_GROUP) void * RG;
…
DVM(CREATE_REDUCTION_GROUP
RG : SUM(S), MAX(X), MINLOC(Y,MIMI));
S = 0;
X = A[1];
Y = A[1];
MINI = 1;
DVM(PARALLEL [I] ON A[I] )
FOR(I, N)
{ S = S + A[I];
X =max(X, A[I]);
if(A[I] < Y) THEN { Y = A[I]; MINI = I;}
}
DVM(REDUCTION_START RG);
DVM(PARALLEL [I] ON B[I])
FOR( I, N) B[I] = C[I] + A[I];
DVM(REDUCTION_WAIT RG);
print(“ %f %f %f %d\n”, S, X, Y, MINI);
While executing reduction group the values of array B elements will be computed.
7. Task parallelism
DVM parallelism model joins data parallelism and task parallelism.
Data parallelism is implemented by distribution of arrays and loop iterations over virtual processor subsystem. Virtual processor subsystem can include whole processor arrangement or else its section.
Task parallelism is implemented by independent computations on disjoined sections of processor arrangement.
Let us define a set of virtual processors, where a procedure is executed, as current virtual processor system. For main procedure the current system consists of whole set of virtual processors.
The separate task group is defined by the following directives.
-
Description of task array (TASK directive).
-
Mapping task array on the sections of the processor arrangement (MAP directive).
-
Redistribution of arrays over tasks (REDISTRIBUTE directive)
-
Distribution of computations (blocks of statements or iterations of distributed loop) over tasks (TASK_REGION construction).
Several task groups can be described in a procedure. Nested tasks are not allowed.
7.1. Description of task group
A task group is described by the following directive:
| task-directive | ::= TASK |
Task group description defines one-dimensional array, which will contain references to the sections of the processor arrangement.
7.2. Mapping tasks on processors. MAP directive
The task mapping on processor arrangement section is performed by the following directive
| map-directive | ::= MAP task-name [ task-index] |
| ONTO processors-name [ section-subscript-string ] |
The tasks of the same array must be mapped on disjoined sections of processor arrangement. Several tasks can be mapped on the same section.
7.3. Array distribution on tasks
Array distribution on tasks are performed by REDISTRIBUTE directives with the following extension:
| dist-target | ::= . . . |
| | task-name [ task-index ] |
The array is distributed on processor arrangement section, provided to the specified task.
7.4. Distribution of computations. TASK_REGION directive.
Distribution of statement blocks on the tasks is described by construction TASK_REGION:
| block-task-region | ::= DVM(task-region-directive ) { |
| on-block ... | |
| } | |
| task-region-directive | ::= TASK_REGION task-name |
| on-block | ::= DVM( on-directive ) operator |
| on-directive | ::= ON task-name [ task-index ] |
Task region and each on-block are sequences of statements with one entrance (a first statement) and one exit (after last statement). For statement blocks construction TASK_REGION is semantically equivalent to parallel section construction for common memory systems. The difference is that statement block can be executed on several processors in data parallelism model.
Distribution of the distributed loop iterations on tasks is performed by the following construction:
| loop-task-region | ::= DVM( task-region-directive ) { |
| parallel-task-loop | |
| } | |
| parallel-task-loop | ::= DVM( parallel-task-loop-directive ) |
| do-loop | |
| parallel-task-loop-directive | ::= PARALLEL [ do-variable ] ON task-name [ do-variable ] |
Distributed computation unit is an iteration of one-dimensional distributed loop. The difference from usual distributed loop is the distribution of the iteration on processor arrangement section, the section being defined by reference to the element of the task array.
7.5. Data localization in tasks
A task is on-block in the static case or loop iteration in dynamic case. The tasks of the same group have the following constraints on data
-
there are no data dependencies;
-
all used and computed data are allocated (localized) on processor arrangement section of the given task;
-
task can't change distribution of the array, distributed before entering the task.
-
there is no input/output;
-
task can update only the values of arrays, distributed on the section, and variables local in block.
7.6. Fragment of static multi-block problem
The program fragment, describing realization of 3-block problem(fig.6.2) is presented below.
DVM (PROCESSORS) void *P[NUMBER_OF_PROCESSORS( )];
/* arrays A1,А2,А3 - the function values on the previous iteration */
/* arrays В1,В2,В3 - the function values on the current iteration */
DVM(DISTRIBUTE) float A1[M][N1+1], A2[M1+1][N2+1], A3[M2+1,N2+1];
DVM(ALIGN WITH A1) float B1[M][N1+1];
DVM(ALIGN WITH A2) float B2[M1+1][N2+1];
DVM(ALIGN WITH A3) float B3[M2+1][N2+1];
/* description of task array */
DVM(TASK) void * MB[3];
DVM ( REMOTE_GROUP) void * RS;
. . .
/* distribution of tasks on processor arrangement sections and */
/* distribution of arrays on tasks */
/* ( each section contain third of all the processors) */
NP = NUMBER_OF_PROCESSORS( ) / 3;
DVM(MAP MB[1] ONTO P(0: NP-1 ));
DVM(REDISTRIBUTE A1[][BLOCK] ONTO MB[1]);
DVM(MAP MB[2] ONTO P( NP : 2*NP-1 ));
DVM(REDISTRIBUTE A2[][BLOCK] ONTO MB[2]);
DVM(MAP MB[3] ONTO P( 2*NP : 3*NP-1 ));
DVM(REDISTRIBUTE A3[][BLOCK] ONTO MB[3]);
. . .
FOR(IT,MAXIT)
{ . . .
DVM ( PREFETCH RS);
/* exchanging edges of adjacent blocks */
. . .
/* distribution of computations (statement blocks) on tasks */
DVM ( TASK_REGION MB)
{
DVM(ON MB[1]) JACOBY( A1, B1, M, N1+1 );
DVM(ON MB[2]) JACOBY( A2, B2, M1+1, N2+1 );
DVM(ON MB[3]) JACOBY( A3, B3, M2+1, N2+1 );
} /* TASK_REGION */
} /* FOR */
7.7. Fragment of dynamic multi-block problem
Let us consider the fragment of the program, which is dynamically tuned on a number of blocks and the sizes of each block.
#define NA 20 /* NA - maximal number of blocks */
DVM (PROCESSORS) void * R[NUMBER_OF_PROCESSORS( )];
int SIZE[2][NA]; /* sizes of dynamic arrays */
/* arrays of pointers for А и В */
DVM ( * DISTRIBUTE ) float * PA[NA];
DVM ( * ALIGN) float *PB[NA];
DVM(TASK) void * PT[NA];
. . .
NP = NUMBER_OF_PROCESSORS( );
/* distribution of arrays on tasks */
/* dynamic allocation of the arrays and execution of postponed directives */
/* DISTRIBUTE и ALIGN */
IP = 0;
FOR(i,NA)
{
DVM(MAP PT[I] ONTO R[IP : IP+1]);
PA[i] = malloc(SIZE[0][i]*SIZE[1][i]*sizeof(float));
DVM(REDISTRIBUTE (PA[i])[][BLOCK] ONTO PT[I]);
PB[i] = malloc(SIZE[0][i]*SIZE[1][i]*sizeof(float));
DVM(REALIGN (PB[i])[I][J] WITH (PA[i])[I][J])
IP = IP + 2;
if ( IP > NP ) IP = 1;
} /*DO i */
. . .
/* distribution of computations on tasks */
DVM(TASK_REGION PT) {
DVM(PARALLEL [i] ON PT[i])
FOR(i,NA)
JACOBY( PA[i], PB[i], SIZE[0][i], SIZE[1][i] );
}
} /* TASK_REGION */
The arrays (blocks) are cyclically distributed on two processor sections. If NA > NP/2, then several arrays will be distributed on some sections. The loop iterations, distributed on the same section, will be executed sequentially in data parallelism model.
8. Procedures
Procedure call inside distributed loop.
A procedure, called inside distributed loop, must not have side effects and contains processor exchanges (purest procedure). As a consequence, the purest procedure doesn't contain:
-
input/output statements;
-
DVM directives.
Procedure call outside distributed loop.
If the actual argument is explicitly mapped array (distributed by DISTRIBUTE or ALIGN), it should be passed without shape changing. It means, that actual argument is the pointer to the array beginning, and configurations of actual and corresponding formal arguments are the same.
Formal parameters.
If an actual parameter of a procedure may be a distributed array then corresponding formal parameter must be specified in the following way:
-
if the actual parameter is a distributed array with known distribution rules, then
DVM(*DISTRIBUTE rd1...rdN) formal-parameter-declaration; -
if the actual parameter is an arbitrary distributed array of a given dimension, then
DVM(*DISTRIBUTE[*]...[*]) formal-parameter-declaration; -
if the actual parameter is an arbitrary distributed array or a usual C array, then
DVM(*) formal-parameter-declaration.
Such formal parameters may be used only in unformatted I/O operations with the whole array.
Note. Similar specification must be used for the headers of functions returning distributed arrays, as well as for pointers to arrays which values are set not at the array creation time but by assignment of values of other pointers.
Local arrays.
In the procedure local arrays can be distributed by DISTRIBUTE and ALIGN directives. A local array can be aligned with formal parameter. The DISTRIBUTE directive distributes the local array on the processor subsystem, on which the procedure was called (current subsystem). If a processor arrangement section is specified in DISTRIBUTE directive, then the number of the processors must be equal to the number of processors of the current subsystem. The number of current subsystem processors is defined by build-in function ACTIVE_NUM_PROC().
Example 9.1. Distribution of the local arrays and formal arguments.
void dist(
/* explicit distribution of formal argument */
DVM(*DISTRIBUTE [][BLOCK]) float * A /* N*N */,
/* aligned formal argument */
DVM(*ALIGN [i][j] WITH A[i][j]) float *B /* N*N */,
/* inherited distribution of the formal argument */
DVM(*) float * C /* N*N */ ,
int N)















