LIBDVM2 (1158351), страница 3
Текст из файла (страница 3)
Therefore a value of processor performance weight WEIGHTperf(P1, ... ,Pi, ... ,Pn) requires to increment the computational weight of any its coordinate by WEIGHTperf1/n(P1, ... ,Pi, ... ,Pn) times.
Since when optimal weight of coordinate Pi is calculated it is necessary to take into account the performance weights of all the processors with coordinate Pi, required weight WEIGHTopt,i(Pi) for any Pi must be a solution of the following optimization task (criterion is best balanced processor loading):
| MAX( abs(WEIGHTopt,i(Pi) - | WEIGHTperf1/n(P1, ... ,Pi, ... Pn) * | ||
| 0 P1 < PSSIZE1 | |||
| WEIGHTcalc,1(Pi))) -- min | |||
The solution in domain of real numbers, more then or equal to 1, is
| WEIGHTopt,i (Pi) = | |
| WEIGHTcalc,i (Pi) * | (MAX (WEIGHTperf1/n (P1, ... ,Pi, ... Pn) + |
| 0 P1 < PSSIZE1 | |
| MIN (WEIGHTperf1/n (P1, ... ,Pi, ... Pn))) / 2 | |
| 0 P1 < PSSIZE1 |
Using optimal weights of processor coordinates for non-uniform distribution of the elements of abstract machine representation (and as consequence, the elements of distributed array and parallel loop iterations) is considered in section 5.7.
4.6. Setting coordinate weights of processor system elements according to specified computational loading.
Let A and B be the processor systems of the same rank. Let the computational weights of coordinates of all elements of the processor system A be equal to 1, and for each coordinate i of an its dimension the computational loading be positive real numbers Li (i = 0, ... ,m-1; m is the dimension size).
Then computational weights of element coordinates of the same dimension of processor system B, providing balanced loading of its processors, can be determine by the criterion
m-1 Ik,last
MAX ( ( (Li )/n - (Li) ) min (1) ,
k=0,...,n-1 i=0 i=Ik,init
where:
n - size of the dimension of the processor system B (n m);
Ik,init Ik,last;
I1,init = 1, In-1,last = m-1;
Ik+1,нач = Ik,кон + 1.
If the solution of the task (1) is
Ik,init,opt, Ik,last,opt (k = 0, ... ,n-1),
then the number
WEIGHTcalc(k) = Ik,last,opt - Ik,init,opt + 1 (k = 0, ... ,n-1).
can be taken as calculating weight of k-th dimension of processor system B.
To solve the task above Run-Time Library provides the function
| long setw_ ( | PSRef | *PSRefPtr, |
| *PSRefPtr | - pointer to the processor system, whose element coordinate weights are determined (the pointer to the processor system B). |
| *AMViewRefPtr | - pointer to representation of an abstract machine, to be mapped with these determined coordinate weights. |
| SizeArray | - array, which i-th element is the size of (i+1)-th dimension of the processor system, for whose element coordinates computational loading are specified (the dimension size of processor system A). |
| CoordLoadArray | - array, containing computational loading of coordinates of the processor system A elements. |
The function setw_ determines computational weights of the coordinates of the elements of processor system, specified by *PSRefPtr pointer, and then assigns them using the function sepsw_ (see section 4.5). If the size of some dimension of the processor system A less then the size of the same dimension of the processor system B, computational weights of all coordinates of this dimension of the processor system B is assumed to be equal to 1.
If the pointer PSRefPtr is equal to NULL or *PSRefPtr has a zero value then the coordinate weights of the current processor system elements will be calculated and assigned.
If AMRefPtr = NULL or *AMRefPtr = 0, then calculated or assigned processor coordinate weights will be used for mapping or remapping all the representations of the abstract machines onto specified processor system (except of the representations, for which its own processor coordinate weights were set by functions setpsw_ and setw_).
The computational loading of the coordinate Pi of i-th dimension of the processor system A is specified by the value of ((i-1)*PSSIZEi + Pi)-th element of array CoordLoadArray (PSSIZEi is the size of i-th dimension of processor system A).
The function returns zero.
5. Mapping abstract machine.
5.1. Mapping abstract machine representation onto processor system (resource distribution).
| long distr_( | AMViewRef | *AMViewRefPtr, |
| *AMViewRefPtr | - pointer to the representation of the parental abstract machine. |
| *PSRefPtr | - pointer to the processor system, which defines the structure of the distributed resources. |
| *ParamCountPtr | - the number of parameters defined in arrays AxisArray and DistrParamArray (see below). |
| AxisArray | - array, which j-th element is a dimension number of the abstract machine representation used in mapping rule for processor system (j+1)–th dimension. |
| DistrParamArray | - array, which j-th element is a mapping rule parameter for processor system (j+1)-th dimension (DistrParamArray[j]³0). |
The processor resource (the resource) of the abstract machine is a set of processors, which forms the processor system the abstract machine is mapped onto.
The function distr_ distributes resources of the parental abstract machine among child abstract machines, and the pointer *AMViewPtr defines the representation containing these abstract machines.
Distributed resources are determined by processor system defined by *PSRefPtr pointer. If *PSRefPtr is zero, or PSRefPtr is NULL, then the Run-Time Library uses the current processor system instead. Before distributing resources among child abstract machines parental abstract machine has to be mapped on some processor system containing all processors from PSRefPtr processor system.
The size of vectors PSAxisArray and DistrParamArray must be equal to *ParamCountPtr, which is equal to or less than the rank of the processor system. In the latter case missing rules are considered as replicating mapping rules (see the rule #2 below).
The following functional correspondence defines abstract machine resource distribution:
{abstract machine representation} => {processor system},
which is performed as described below. Let F be a multifunction, with the domain of definition in a space of indexes of the abstract machine representation and with the image in a space of indexes of the processor system:
| F((I1, ... ,Ii, ... ,In)) = | F1(I1, ... ,Ii, ... ,In) x |
where:
| x | - symbol of the Cartesian product; |
| n | - rank of the abstract machine; |
| m | - rank of the processor system; |
| Ii | - index variable of the i-th dimension of the abstract machine representation; |
| Fj | - multifunction with an image in a set of index variable values of the processor system j-th dimension. |
Let the child abstract machine be defined by index vector (I1, … ,In). Then the resource, which will be assigned to this abstract machine, is processor aggregate defined by set F((I1, ... , In)) (the values of these functions are sets consisting of vectors of the index space of the processor system). Mapping the representation of the abstract machine onto processor system is a resource distribution (through specification of function F) among components of this representation.
The, F1..., Fm functions are called «coordinate mapping rules». Run-Time Library provides the following mapping rules that allow realizing some block distribution of the abstract machine representation onto the processor system:
1. Fj(I1, ... ,In) = {[Ik/BLSIZEk]} , where:
| k = f(j) = AxisArrayf[j1] | - the dimension number of the abstract machine representation (1 k n, f(j1) # f(j2) when j1 # j2); |
| BLSIZEk | - positive integer (the block size of the k-th dimension of the abstract machine representation). |
This mapping rule means that for each element (I1,…, In) of the index space of the abstract machine representation the corresponding image-set contains one element [Ik/BLSIZEk]. This element is within the values range of index variable of the j-th dimension of the processor system.
The BLSIZEk value is determined as follows. Let:
| AMSIZEk | - the size of the k-th dimension of the abstract machine representation; |
| PSSIZEj | - the size of the j-th dimension of the processor system. |
Then,
| BLSIZEk = { | min(DistrParamArray[j-1], AMSIZEk) |
Note, that the maximum value of Ik is
min((PSSIZEj*BLSIZEk-1) , AMSIZEk-1 ).
Therefore, to use the mapping rule for the whole range of the k-th dimension of the abstract machine representation the following is required:
AMSIZEk PSSIZEj*BLSIZEk,
that is true, when















