supercomp91 (1158319), страница 3
Текст из файла (страница 3)
To pursue this further, observe that if wewish to build a [M; M] array that is distributed byblocks of rows to processors we can use declarationsof the form4Kernel. In addition, there are several other important operators that are found in the Kernel. For example, the method new creates an element and themethod vect new creates an array of elements of givendistributions. A DOALL operator can be used tocause a particular processor representative or a particular set of processor representatives execute a message concurrently. A DoSubset operator can be usedto send a message to a set of elements in the collection.The method get statistics summarizes the informationof the current distribution among processors and themethod balance balances distribution of the objectsamong processors.Darray<element> G1([MAXPROC],[M,M],[Block,Whole]);Darray<element> G2([MAXPROC],[M,M],[Block,Whole]);which means that processor representative 1 will getG1[1 ..(M/MAXPROC), 1 ..
M] and G2[1.. (M/ MAXPROC), 1 ..M] and representative 2 will get the nextM=MAXPROC rows, etc.In the case like the one above when we have twoconforming instances of the same distributed collection, we can use a form of data parallel expressionthat allows the structures to be treated as a whole.For example,G1.average =4.3 Relations between Collections andElementsG2.average + 2 * G1.average;represents a parallel computation involving theaverage eld of the corresponding elements of the respective collections.In this section, we describe the relations betweencollection and elements.
Let's begin with the followingdenitions.Denition 1 A collection method is primitive if1. All the usages of ElementType in the collectionmethod do not requires virtual element methodsdeclared in the MethodOfElement regions.2. It invokes only primitive collection methods in thecurrent collection.We classify the collection into two categories. Oneis called a complete element collection and the otheris called aeld element collection.
Let E be an plainC++ object, and Coll be a collection, we call Coll <E > a eld element collection. If F is an object anddeclared as4.2 Kernel CollectionThe PC++ language begins with a primitive datastructure called Kernel collection. The Kernel is theroot of the hierarchy of collections. The Kernel canbe considered as a simple set of elements which aredistributed among processor representatives.
The hierarchy of collections derived from kernel is shown inFigure 1.There are four arguments associated with the kernelwhen we create a new instance of structure. They arethe collection variant, the array of processor representatives, the size of the whole collection, and the distribution scheme. The collection variant can be FixedArray, GrowingList, or SynSet. The variant FixedArraymeans the whole collection can be seen as a distributedarray. It is very similar to the conventional array except for that the elements of the collection are distributed among processors representatives. The variant GrowingList means the number of elements in thecollection will grow dynamically from parallel phaseto parallel phase. If the variant is either FixedArrayor GrowingList, the elements are indexed and can beread or written through the indexes.
Finally the variant SynSet means a set of unordered elements. Thenumber of elements in a SynSet collection can be either increased or decreased at run time. This kind ofstructure is particularly useful to served as buers inthe producer/consumer problems or the state space ofsearching problems.The Kernel is the root of the family of distributedcollection.
The basic, three argument constructor, described in the previous section is inherited from theclass F ElementTypeOf Coll { method_list};we called Coll < F > a complete element collection. An element in a complete element collection willautomatically inherit the virutal element methods andinstances described in the collection it inherited andthe element of a eld element collection will not inherit any virtual instances and methods from the collection. Furthermore, a eld element collection canonly invoke the collection methods which are primitive as described in denition 1.There are three ways that a eld element collectioncan be formed. The rst way is through the referenceof the eld of a complete element collection.
For example, in gure 2, W is a complete element collectionof type Sequence < element > and W.rst eld formsa eld element collection of type Sequence < int >.Second, a eld element collection can be constructed5KernelDistPriorityQueueBisectedListDistributedListDistributedArrayIndexedBinaryTreeMatrixFigure 1: Collection Hierarchies of Distribution Objectsby inheriting the shape from an existing collection instance. For example, W1 is constructed from W andof type Sequence < int > : Finally, we can constructthe eld element collection directly if the collectionhas a constructor which is primitive.In summary, we have the following relations: If an object class F forms a complete element collection Coll1 < F >, then it can not form anothercomplete element collection Coll2 < F >.
Let E be a plain C++ class, then E can formdierent kinds of eld element collections. Forexample, we can have Coll1 < E > and Coll2 <E > at the same time. A collection Coll can form dierent complete element collections as well as eld element collections. We can have Coll < E1 >,Coll < E2 >,and Coll < E3 > at the same time.Collection Sequence : Kernel {Sequence(vconstant *P,vconstant *G, vconstant *D);Sequence(Sequence &ExistingCollection);sorting();// assume it is primitiveMethodOfElement:operator +();operator =();operator >();};class element ElementTypeOf Sequence{int first_field, second_field;public:operator +();operator =();operator >();};Sequence<element> W([MAXPROC],[N],[Cyclic]);foo() {Sequence<int>W1(W);(W.first_field).sorting();W1= W.first_field + W.
second_field;W1.sorting();}4.4 ExamplesFigure 2: Examples of Dierent Element Collections4.4.1 A Smoothing AlgorithmA smoothing algorithm implemented by using a DistributedArray is shown below. Each of the arrays aredistributed by [Block, Block] among processors. Thisis the Kali notation for partitioning the elements sothat blocks of M=MAXPROC by N=MAXPROC elements are assigned to each processor representative.The smoothing algorithm works by having an arrayof elements each does a local 5-point star relaxationand returns an array of values dened by the methodupdate.update(){return(4.0*v -(self(1,0)->v+self(-1,0)->v+self(0,1)->v+self(0,-1)->v)); }};Darray<E> A([MAXPROC,MAXPROC],[M,N],[Block,Block]);Darray<E> W([MAXPROC,MAXPROC],[M,N],[Block,Block]);smoothing() {for (i=1; i <MAX_ITERATION ;i++){ W->v = A->update();A->v = W->v;}}class E ElementTypeOf Darray {float v;65 Applications4.4.2 A Max Finding Algorithm5.1 Matrix MultiplicationThis example illustrates two features of the language.The rst feature is the ability of a collection to inheritalgebraic structure.
We will dene a collection calledIndexedTree in terms of the DistributedList collectiondened earlier. An indexed tree is a tree where eachnode has an index to identify it. The root is index 1,the second level has indices 2 and 3, the third levelhas indices,4,5,6,7 and,i in general, the ith level has(i 1)indices 2through 2 1. The index will be set upproperly inside the constructor IndexedTree().
This isdone as follows.In this section, we will show a simple matrix multiplication algorithm written in PC++ language. Wewill begin with the creation of a matrix collection asfollows:Collection matrix : DistributedArray {matrix();matmul(matrix *B1, matrix *C1);MethodOfElement:dotproduct(matrix *B, matrix *C);};The matrix collection is derived from distributedarray and has a constructor matrix() with three arguments, the processor arrays, the size of global arrays,and the distribution schemes. The operator matmulis to multiply two given matrixes, B and C, and storesthe result in the matrix which invokes the computation. This can be done by computing the dotproductof the ith row of B and jth column of C for every element of which index is (i,j) in the current matrix.
Thisis shown below:Collection IndexedTree : DistributedList{IndexedTree();MethodOfElement:int this_index;lchild() { return( self(this_index)); }rchild() { return( self(this_index+1)); }}We leave it to the reader to verify that this denition of the left and right child will give an ordering consistent with the numbering scheme describedabove.
Our Max-Find will work by distributing the nelements to be searched in an indexed tree. Startingwith the bottom level of the tree, each element willask for the maximum found by its children.matrix::matmul(matrix *B,matrix *C) {this->dotproduct(B,C);}One thing is worthy of notice is that the dotproduct() described in the collection is a virtual functionof the element and will be overloaded when the realelement is declared. The invocation of the virtual element method dotproduct() by the collection meansthe method is applied to all elements of matrix collection.We now show a basic element with a eld name xbelow:class element : ElementTypeOf IndexedTree {float v;public:float local_max(){if(v < lchild()->v) v = lchild()->v;if(v < rchild()->v) v = rchild()->v;return v;};};IndexedTree<element> X([MAXPROC], [N], [Block]);float max_find(N){for(i = log2(N)-1; i > 0; i--)X.DoSubset([2**i:2**(i+1)-1:1],X.local_max);return(X[1].local_max());}class elem ElementTypeOf matrix {float x;// overload productfriend elem operator *(elem *, elem *);elem operator +=( elem &); // overload +=dotproduct(matrix<elem> *B,matrix<elem> *C){int i,j,k,m;i= thisindex[0]; j= thisindex[1];m = B->GetSizeInDim(1);for (k=0; k< m; k++)*this += B(i,k)*C(k,j);}}The DoSubset operator is the second feature illustrated in this example.