pcxx_ug (1158314), страница 5
Текст из файла (страница 5)
Ver 1.0+Thread environment classes provide a simple, portable mechanism for a limited form of parallelism. However, a more important abstraction is based on a structure called a concurrentaggregate, or collection.The motivation is to describe a structure that denes a set of objects of some given typethat can be distributed over the memory modules of a parallel system.The way in which elements of a collection are distributed over processors is determinedby a two-step mechanism similar to HPFF Fortran. In the rst step, collection elements aremapped to a logical coordinate system.
This rst mapping is called an alignment and it isdened by an Align class object. A Distribution class object denes the logical coordinatesystem and the way in which it is mapped to Processors object threads.1The declaration of a collection is specied by a collection type name followed by a typename of the element objects enclosed in angle brackets. The arguments for the three constructors dene these collections in terms of alignment and distribution objects.collection-name <element-name >object( constructor arguments )For example, suppose we want to create a matrix, A, and two vectors, X and Y, of complexnumbers and want to distribute them over processors of a parallel machine.
Given a C++class for complex numbers, Complex, we can build distributed matrix and vector collectionsby using the pC++ library collection classes DistributedMatrix and DistributedVector asfollows:DistributedMatrix<Complex>DistributedVector<Complex>DistributedVector<Complex>A(...constructor arguments ...);X(...constructor arguments ...);Y(...constructor arguments ...);4.1 Distributions, Alignments, and Processors Ver 1.0+Distributions can be viewed as abstract coordinate systems that allow us to align dierentcollections with respect to each other. If two elements from two dierent collections aremapped to the same Distribution point, they will be allocated in the same processor memory.Consequently, if there is a data communication between two collections, it is best to alignthem so that costly interprocessor communication is minimized.Unlike Templates in HPFF Fortran, Distributions in pC++ are rst class objects.2 ADistribution is characterized by its number of dimensions, the size in each dimension andthe function by which the Distribution is mapped to processors.Note the pC++ concept of Distribution is, by design, completely identical to the HPF Template.
Unfortunately, the word template already has a meaning in C++, so we changed the name to avoid confusion.2 Because Distributions are rst class objects they can be created at run-time or passed as a parameter toa function. This is very convenient for creating collections at run time, i.e., when creating a new collectionthe Distribution and align objects can be taken from another collection with which the new collection is tobe aligned.118Current distribution functions allowed in pC++ include BLOCK, CYCLIC, and WHOLE.To understand these consider the mapping of a one dimensional Distribution to a set of processor objects.
The constructor for a one dimensional distribution takes the formDistribution distribution-object-name(size, &(a processors object),map-function);For example to make a one dimensional Distribution Q of size n over a processors objectP with a BLOCK mapping one would writeDistribution Q(n, &P, BLOCK);The mapping function BLOCK means that the n positions in the distribution are mapped toprocessor objects by associating contiguous blocks of positions with each processor object sothat the load is even. For example, if P has 3 threads, then positions 0 through 3 will go tothread 0, positions 4 through 7 to thread 1, and position 8 through 11 to thread 2.
In thecase that the number of threads does not divide the size of the distribution we use the rulesgiven by the HPF Forum.Collection are sets of objects that are distributed over processors object threads by a twostage mapping process. One stage is dened by a distribution. The rst stage is denedby specifying an alignment object. An alignment object denes the size and shape of acollection and the way in which it is mapped to a distribution.The syntax for an Align constructor is given byAlignobject-name( dimensions-of-collection, mapping-string);For example, a one dimensional collection of size 100 mapped to a one dimensional Distribution of the same size would be dened by the verbAlign OneD( 100, "[ALIGN( dummy[i], myDistribution[i])]");The mapping string is designed to be similar to the HPF ALIGN directive.
it is merely anotation to describe a linear ane imbedding of one lattice into another. In other works, weuse array notations to say how one array is to align with another. In the example above itis the identity mapping. The mappingALIGN( dummy[i], myDistribution[2*i])would map the source array to the even positions of the target. The mappingALIGN( dummy[i], myDistribution[3][2*i+1])would map the source array to the odd position of the third row of the target. However themappingALIGN( dummy[2*i], myDistribution[i]) //<<< error19is in error because it does not dene the odd members of the source. But the mappingALIGN( dummy[i], myDistribution[i/2])denes a two to one mapping of the source to the target and it is o.k.In version 1.0 of pC++, one can only dene array shaped collections and Align is alwaysdimensioned like an array.
In future versions, more general Align constructors will be madeavailable.The mapping function is described in terms of a text string which corresponds to theHPFF Fortran alignment directive. It denes a mapping from the domain to a Distributionusing dummy domain and dummy index names.To map a two-dimensional matrix A to a set of processors one can dene a two-dimensionalDistribution and align the matrix with the Distribution and then map the Distribution tothe processors. Suppose we have a 7 by 7 Distribution and the matrix is of a size of 5 by 5,and suppose the Distribution will be mapped over the processor object threads so that anentire row of the Distribution is mapped to an individual processor and the th row is mappedto processor i mod P.numProcs().
This mapping scheme corresponds to a CYCLIC map inthe Distribution row dimension and a WHOLE map in the Distribution column dimension.The Distribution and distribution can be dened as follows.iProcessors P;Distribution myDistribution(7, 7, &P, CYCLIC, WHOLE);The alignment of the matrix to the Distribution is constructed by the declarationAlign myAlign(5, 5, "[ALIGN( dummy[i][j], myDistribution[i][j])]" );DistributedMatrix<Complex> A(&myDistribution, &myAlign);Notice that the alignment object myAlign denes a two-dimensional domain of a size 5 5,and a mapping function.We may now align the vectors to the same Distribution.
The choice of the alignmentis best determined by the way the collections are used. For example, suppose we wish toinvoke the library function for matrix vector multiply as follows.Y = A*X;While the meaning and computational behavior of this expression is independent of alignmentand distribution, we would achieve best performance if we aligned X along with the rst rowof the matrix A and Y with the rst column. (This is because the matrix vector algorithm,which is described in more detail later, broadcasts the operand vector along the columns ofthe array and then performs a reduction along rows.). The declarations take the formAlign XAlign(5, "[ALIGN( X[i], myDistribution[0][i])]");Align YAlign(5, "[ALIGN( Y[i], myDistribution[i][0])]");DistributedVector<Complex> X(&myDistribution, &XAlign);DistributedVector<Complex> Y(&myDistribution, &YAlign);20Collection(N1,N2)[ALIGN(Collection[i][j],Template[i][j])]dim 1 [0:N1]dim 2[0:N2]Template(T1,T2,P,Block,Block)dim 1 [0:T1][0:P1]dim 2[0:T2]Figure 2: Alignment and DistributionThe alignment and the Distribution form a two stages mapping, as illustrated in Figure 2.The array is mapped into the Distribution by the alignment object and the Distributiondenition denes the mapping to the processor set.Because all of the standard matrix-vector operators are overloaded with their mathematical meanings, pC++ permits expressions likeY = A*X;Y = Y + X;even though X and Y are not aligned together.
We emphasize that the meaning of thecomputation is independent of the alignment and distribution; the correct data movementswill be generated so that the result will be the same.4.2 Collection as a Template TEClass. Ver 2.0Template TEClass objects are not supported in pC++ version 1.0. This section is a previewfor version 2.0.The most direct approach to building a collection is to dene it as a template TEClass,where we supply the type of the collection element as the rst template parameter. Thetemplate structure can be derived from a parent template TECLASS which will providethe data structures to manage the set of element on each processor thread according to thealignment and distribution specications.
We call this data structure manager collection theSuperKernel. The basic Template TEClass takes the following form.template<Class ElementType> TEClass C: SuperKernel<ElementType>{C(Distribution *D, Align *A, n, m): SuperKernel(D, A) { ... }...21int alpha();};Given an element class E dened byclass E{public:float x, y;void f( int i);double g();};and a distribution D and alignment A, one can create a collection X of type C withelement type E with a declaration of the form:C<E> X(&D, &A);There are a number of dierent parallel operators that can be applied to the collection X. Thesimplest is the evaluation of TEClass member functions, such as alpha() in the collection Cabove, on each thread associated with the distribution.
Because the collection is, at heart,a TEClass this is nothing new and the expressionX.alpha();behaves exactly as described in the previous chapter.However, with a collection there is another form of parallelism. Given a member functionof element class, f(), one can apply the function to every element of the collection with theexpressionX.f(1);Alternatively, one can index a subset of elements in a collection with the expressionX[i:j:k].f()which will invoke the function f() on elements with index in the range from i to j withstep size k. (For multi-dimensional collections, the array notation extends in the obviousmanner, but this is not reliable in Version 1.0.)It is also possible to cause an element member function to be invoked on a single elementof a collection by using the SuperKernel overloaded operator (...).4.3 Element Types. Ver 1.0+There are several restrictions on the classes that are allowed as element types of a collection.We list here some of these restrictions and try to explain why they exist and which restrictionswill be lifted.22 An ElementType in a collection must not contain pointers to heap based data.