rtsIDe (1158448), страница 11
Текст из файла (страница 11)
The function dopl_ allows to determine the completion of the execution of all the parallel loop parts, on which the loop has been divided by the functions exfrst_ or imlast_. At each next function dopl_ execution, the Run-Time System corrects the loop parameters in the arrays OutInitIndexArray, OutLastIndexArray and OutStepArray, defined as output arrays in the function mappl_.
The function returns the following values:
| 0 | | the execution of all parts of the parallel loop is completed; |
| 1 | | the execution of the parallel loop has to be continued (the information about next iteration portion is put in OutInitIndexArray, OutLastIndexArray and OutStepArray arrays). |
On changed order of iteration execution dopl_ function sequentially returns 2*n+1 positive values (see section 9.3). If the loop execution is not partitioned by exfrst_ and imlast_ functions, dopl_ function returns non-zero value only one time. At that the information, put to OutInitIndexArray, OutLastIndexArray and OutStepArray arrays by mappl_ function and corresponding to the local part of the loop, is not changed.
Note, that splitting loop onto parts can be also associated with unregular mapping of pattern space onto the processor system. In this case the correction of the current loop parameters when control exits the next regular interval and requesting completion of all intervals are performed by the function dopl_ also.
Invoking dopl_ function is allowed for mapped parallel loop only.
9.5Terminating parallel loop
long endpl_ (LoopRef *LoopRefPtr);
*LoopRefPtr reference to the parallel loop.
The function endpl_ completes the parallel loop execuôion and forces merging the parallel branches to the parental one. The Run-Time System automatically deletes all objects created inside the parallel loop. The exception is static objects. (that is, the parallel loop is a program block beginning with the function crtpl_ call, see section 8).
Run-Time System allows to terminate a program by lexit_ function (see section 2) during any parallel loop execution without it previous termination by endpl_ function.
When the control exits from the parallel loop, the reference to the loop is undefined and so it cannot be used at any Run-Time System call.
The function returns zero.
9.6Specifying information about data dependence between parallel loop iterations
Partitioning parallel loop execution into parts (required for asynchronous renewing of distributed array shadow edges, dynamic redistribution of computations over processors and so on) causes changing of the loop iteration execution order that can result in wrong computations if data dependencies exist. For correct partitioning of the loop iterations into portions Run-Time System is needed in information about all points of the form (I1+D1, … , In+Dn), in which the values of computed variable are required to calculate its value in (I1, ... , In) point (n is the loop rank, all Di are integers). It is assumed that at least two values of Di are non-zero (Run-Time System doesn’t use the information about points, in which a number of non-zero Di is less or equal to one).
The information about every point of the form, described above, is passed to Run-Time System by separate call of the function
| long pldpnd_( | LoopRef | *LoopRefPtr, |
| *LoopRefPtr | | reference to parallel loop |
| DependCodeArray | | array, which i-th element is the code of data dependence between the loop iterations for its (i+1)-th dimension. |
Data dependence codes, specified in i-th element of DependCodeArray array, can be:
| 0 | | there is no data dependence (Di+1 is zero); |
| 1 | | data anti-dependence (Di+1 is positive); |
| 2 | | data dependence (Di+1 is negative). |
To inform Run-Time System about several points pldpnd_ function is called several times. All the function calls must precede parallel loop mapping by mappl_ function (see section 9.2).
Absence of pldpnd_ function calls before the loop mapping means, that there are no data dependences, restricting the loop execution by parts, between the loop iterations.
The function returns zero.
Note. If the loop execution for some dimension is partitioned into parts, this dimension is named split. When selecting split dimensions Run-Time System treats dimensions with lager numbers as more priority ones (their index variables are changed more quickly) and in according to criterion: right side of assignment statement can't contain a point, whose set of split dimensions contains two or more dimensions with multi-directional data dependences (for dimensions with the same signs of index variable steps) or with uni-directional data dependences (for dimensions with different signs of index variable step).
10Representation of the program as a set of subtasks executed in parallel
A parallel subtask is a pair (<abstract machine>, <processor subsystem>). A subtask group is an abstract machine representation, each element corresponding to a processor subsystem. A parallel subtask is in execution state (active) at the processor belonging to the subtask processor system, if the subtask abstract machine is current. Each processor always executes the only subtask with the (the current) abstract machine mapped on the processor.
A subtask can be created by the functions, mapping an abstract machine representation onto a processor subsystem (distr_, redis_, mdistr_, mredis_), considered in section 5. The created in such a manner subtasks are activated when entering a parallel loop iteration.
Let us consider now creation of the parallel subtasks by explicit specifying of the correspondence <abstract machine> <processor subsystem> and a way of the subtask initialization.
10.1Mapping abstract machine (subtask creation)
| long mapam_ ( | AMRef | *AMRefPtr, |
| *AMRefPtr | | reference to abstract machine to be mapped. |
| *PSRefPtr | | reference to the processor system, determining a structure of the processors, assigned to the abstract machine (execution area of the created subtask). |
To create a subtask successfully, its abstract machine and processor system must satisfy to the following requirements:
-
The abstract machine, specified by *AMRefPtr reference, must belong to parental abstract machine representation, created in the current subtask and must be direct or indirect descendant of the current abstract machine;
-
Parental abstract machine must be mapped.
It is allowed repeated mapping (remapping) of abstract machine, which has no descendants (the abstract machine, corresponding to terminal of abstract machine tree). It is not allowed to remap abstract machine, if it belongs to parental abstract machine representation, mapped by distr_ (mdistr_) function.
The function returns zero.
10.2Starting subtask (activation)
long runam_ (AMRef *AMRefPtr);
*AMRefPtr reference to the abstract machine of started subtask.
To start a subtask successfully its abstract machine must belongs to one of the current abstract machine representations.
After the subtask startup, its abstract machine and processor system become the current ones. Together with the current processor system replacing internal numbers of central processor and input/output processor are also replaced (each processor system has its own central and input/output processors).
The function returns the values:
| 0 | | the subtask is not started (the current processor doesn't belong to the subtask processor subsystem); |
| 1 | | the subtask is started. |
10.3Completing (stopping) current subtask
long stopam_ (void);
After the function stopam_ execution the abstract machine, parental for the abstract machine, assigned to the stopped subtask, becomes the new current one (the abstract machine of the subtask, activating stopped subtask). Similarly, the processor system, which subsystem is processor system of stopped subtask, will become the current one.
A subtask can't be stopped during parallel loop execution. Initial subtask can't be stopped also (there is no a subtask, that created it).
Run-Time System allows to terminate a program by lexit_ function (see section 2) during any subtask execution without its previous stop by stopam_ function.
When the subtask is stopped, all objects (except of static ones) created from the moment of its initialization, are automatically deleted, i.e. the points of subtask starting and stopping defines a program block (see section 8).
The function returns zero.
11Reduction
The reduction is the computation of the specified function (named "reduction function") with the parameters received from the variable (named "reduction variable") from the different processors executed the different parallel program branches. After reduction execution completes, all copies of the reduction variable in the program branches became equal to the value returned of the reduction function.
For optimization purposes, the Run-Time System executes the reduction over reduction group that is aggregate of the reduction variables and reduction functions. Each reduction variable in reduction group associates with own reduction function.
11.1Creating reduction variable
Following variables can be used as reduction variables:
-
scalar variable;
-
an element of the "normal" array (that is array, replicated among all the processors);
-
one-dimensional "normal" array.
Generally it is assumed that reduction variable is a one-dimensional array, and reduction function is executed on each array element.
Creating reduction is declaration of reduction variable and corresponding reduction function:
| RedRef crtred_ ( | long | *RedFuncNumbPtr, |
| *RedFuncNumbPtr | | the number of the reduction function. |
| RedArrayPtr | | pointer to the reduction array-variable. |
| *RedArrayTypePtr | | the type of the elements of the array-variable. |
| *RedArrayLengthPtr | | the number of the elements in the array-variable. |
| LocArrayPtr | | pointer to the array containing additional information about reducôion function (the number of the elements in this array has to be equal to the number of the elements in the reduction array-variable). |
| *LocElmLengthPtr | | the size (in bytes) of an element of the array with additional information. |
| *StaticSignPtr | | the flag of the static reduction declaration. |
The function crtred_ creates a descriptor of the reduction. The function returns reference to this descriptor (or the reference to the reduction).















