cnrs (1158306), страница 2
Текст из файла (страница 2)
It is ourfeeling that this beyond the scope of the intent of the HPF directives. (Similar problemsexist in Fortran 90 which allows array subsections to be described by pointers which canthen be passed to subroutines.)4. Concurrent Aggregates: Collection ClassesA more challenging exercise than specifying a C extension to the semantics of HPFannotations is to see how the concept of data distributions can be incorporated into theobject oriented features of C++. In this section we show that by extending the C++language with a special type of concurrent aggregate class [9] we can generalize the ideasembodied in HPF.Our C++ extension is called pC++ and it is based on adding one new type of specialclass called a Collection to the language.
A Collection is a structured set of objects froma given class. Collections represent the mathematical abstractions that we associate withdistributed aggregates. For example, the library collection type DistributedArray is thepC++ analog of the Fortran 90 array. Associated with DistributedArray are the standard intrinsics associated with Fortran 90 array operations including element-by-elementarithmetic operations such as addition, multiplication and basic mathematic operationsas well as reductions and other parallel operators that involve the structure of the array.For example, the Fortran 90 array operations belowreal, dimension(100,100):: x,yreal, dimension(100)::cfcf = sum(x*cos(y), dim=1)say that cf is the vector of values obtained by rst taking the pointwise product of eachelement of x with the cosine of the corresponding element y and then summing along therst dimension.
The same expression in pC++ can be written asDistributedArray<float> x(..[100,100], ...);DistributedArray<float> y(..[100,100], ...);DistributedArray<float> cf(..[100], ...);cf = (x*y.Cos()).sumReduce(0);Notice that the constructor for the arrays above resembles of a C++ template declaration.A DistributedArray is an indexed collection. the parameter between the angle bracketsspecies the type of the object that sits at each position in the array. Also note that theconstructor parameters have not been fully specied. To complete the specication of aDistributedArray we need a TemplateClass and alignment functions where by TemplateClass we mean the abstract hpf template (which is not to be confused with the C++template construct!)A TemplateClass species an abstract grid of positions, an array of virtual processorsand a distribution function that maps the grid to the processor array.
In other words, thetemplate class corresponds to the second stage of the HPF mapping. The syntax for itsconstructor takes the formTemplateClass T( <size of target-template>,< processor-array>,<distribution-function>);The size and processor-array specications are denoted by vector constant expressions ofthe form[3, 4] or [p] or [3*q, 99, 99/p], etc.The distribution function is the same as the HPF mappings:[Block, Cyclic, Whole]where Whole is used for the HPF * distribution.Collections are created with a constructor of the form shown below.CollectionClass<element-type> C(TemplateClass T,<collection-size>,<align-function> )(<element-constructor>);The format of the collection size specication depends on the nature and shape of the collection. The alignment functions specify a mapping between the elements of the collectionand the positions on the template grid.The primary generalization that collections make over HPF distributed arrays is thatcollections can dene arbitrary topologies and mathematical structures.
As we shall seein the remainder of this paper, the variety of shapes and forms of distributed concurrentaggregates is, indeed, very large.The syntax of a collection declaration is given byCollection collection_name: inherited_collection{\\ constant data and private functions.public:< Aggregate functions and operators >MethodOfElement:<Element type functions and operators >};All non-element data associated with a collection must be constant as it is duplicated oneach processor and is visible to each element. This is because the philosophy behind thecollection concept is that real variable data is all contained in the elements.There are two types of method functions associated with the collection. The rst arethe public denition global functions.
These include . A constructor for the collection and a destructor if needed. A list of method functions to access elements of the collection from the global threadof computation. Examples include the index operators like [ ]. A list of parallel operators and functions that operate on the collection as a whole.Examples include reduction, parallel prex,broadcasts and parallel matrix algorithms. A list of graphical and instrumentation hooks that simplify the task of doing programalgorithm animation and performance analysis.The second class of methods are called MethodOfElement functions.
There are two typesof these. A list of method functions and operators that are required of the element type forthe collection operators to work correctly. For example, the sum reduction operatorassumes that each element must have an associated sum operator dened. Theseelement requirements are listed here to satisfy the strong typing of C++. A list of method functions that can be called by the elements to access informationabout the global structure of the collection.
Examples of this type of functioninclude functions to access the neighbors in a grid, the global index of an elementin an array or the children or parent in a tree.The principle parallel operation associated with a collection is the concurrent evaluationof element method functions. For example, to write a simple Jacobi iteration on a twodimensional square grid, one can use a Grid collection and dene an element class as avertex as follows.class vertex ElementTypeOf Grid{public:float f, oldx, newx;float relax(float lambda){return f - lambda*(4.0*oldx + self(0,1).oldx +self(1,0).oldx + self(0.-1).oldx + self(-1,0).oldx);};The vertex class is said to be of ElementTypeOf Grid.
This implies that the vertex canuse the MethodOfElement functions from the Grid collection class. One of these is theself(i,j) function which returns a reference to the neighbor oset by (i,j) of the callingelement. A parallel Jacobi style iteration can now be written in the form shown below.Grid<vertex> G(.....);while(...){G.newx = G.relax(0.01);G.oldx = G.newx;}The concurrency comes in the invocation of relax() on all elements of the collection inparallel and in the parallel assignment. Synchronization is accomplished by an impliedbarrier at the end of each collection operation.Inheritance plays a major role in the denition of collections. The basic collectionlibrary, shown in Figure 1, contains many of global structures that are encountered inscientic and engineering applications. As the library develops further, we expect thishierarchy to grow.
An essential feature of its design is user extensibility. At each level inthe hierarchy users should be able to \subclass" new collection types as needed for specialapplications. In the following sections we will discuss the individual subclasses in greaterdetail.4.1. Matrix Vector RevisitedTo illustrate the ideas presented so far we will return to the matrix vector productexample from section 2. Because we are dealing with mathematical structures related tolinear algebra we will start with a basic distributed matrix collection which can be viewedas a distributed array with the matrix algebraic abstractions.
The key denitions for thematrix time vector operation are given below.Collection DistributedMatrix: DistributedArray{// TemplateClass T; inherited from Distributed Array// int size;inherited from Distributed Arraypublic:...DistributedVector<Vtype> operator *(DistributedVector<Vtype> x){DistributedArray<Vtype> tmp(T, [size,size], [i,j]->[i,j]);tmp.duplicate(x, 0); /* spread x over the rows of tmp */tmp = (*this)*tmp;/* elementwise parallelism*/return tmp.sumReduce(1);/* row wise reduction */KernelProcessorCSPOrderedSetDistributedArrayDistributedVectorDistributedBlkVectDistributedMatrixDistributedBlkMatrixDistributedRowMatrixDistributedColMatrixDistGridDisUnstrucMeshDistributedTreeFigure 1. Partial Collection Library Inheritance Tree.}...}The matrix-vector product operator shown here is very simple.
It works by making atemporary distributed array which holds a copy of x in each row. This is accomplishedby using library functions associated with the DistributedArray collection type. Themultiplications are then done in parallel by invoking the parallel array multiply operatorfrom the distributed array class. The products are then added together along rows toform the nal result.The Row Blocked Version of the program now takes the formTemplateClass T( [n,n], [p], [Block, Whole]);DistributedMatrix<float> M( T, [n,n], [i,j]->[i,j]);DistributedVector<float> x( T, [n], [i]->[i,0]);DistributedVector<float> y( T, [n], [i]->[0,i]);y = M*x;This is completely equivalent to the rst HPF example in section 2.The Square Blocked Version of the program can be written asTemplateClass T( [n,n], [q,q],DistributedMatrix<float> M( T,DistributedVector<float> x( T,DistributedVector<float> y( T,[Block, Block]);[n,n], [i,j]->[i,j]);[n], [i]->[i,0]);[n], [i]->[0,i]);y = M*x;This is completely equivalent to the second HPF version in section 2, but with the dierence that we did not have to change the source code.