pcxx_ug (1158314), страница 7
Текст из файла (страница 7)
If an element i in a collection x is not local to the thread that executes x(i), thenx(i) points to a copy of the element. This copy should be viewed as a \cached" copy ina one element cache. Consequently, if a thread tries to access two non-local elements thenonly one may reside in the cache at any time. In other words, x(j) is a reference to anotherelement after the reference to x(i), then the second reference will eliminate the rst. So,for example, in the code below, the second reference to x(i) is an error.iElementType *p;p = x(i);printf(" %d", x(j)->Ident);printf(" %d", p->Ident);// x(j) overwrites *p// error: p now points to x(j)There are ways to avoid this problem by using the function Get_CopyElem() which is described in section 4.7.In version 2.0 of pC++ we will change some of these rules.
A new data type modierglobal will be added. This idea comes from the CC++ language. The idea is that any dataobject that is of global type may be referenced by pointers of the formglobal ElementType *p;In this case expressions of the form p->y = 3.14 or p->f() will cause a message to be sentto the processor object thread that contains the element to perform the given action on theelement. Additional information on element communication can be found in section 4.7.284.6.1 Vector Subsets of a Collection.
Ver. 1.0+Collection parallelism is based on the concurrent application of a class member function toevery object in a large set. However, it is not always the case that we want every object tocarry out this action. Often we only want to apply the function to a subset of the objects.pC++ allows a mechanism to select a vector subset of elements in a parallel action. We willillustrate the use of this operator below.To implement a parallel tree reduction the collection will dene the addition operator asvirtual so that it may use it in the MethodOfElement section.Collection C: SuperKernel{public:C(Distribution *D, Align *A);ElementType reduce1();MethodOfElement:virtual ElementType &operator +=(ElementType &);void sumFrom(int toRight);};The element function sumFrom() access an element a given distance to the right of itself andadds that element to its own value.
To access an element at a given oset from the current,i.e. this, element, one must have access to the collection structure. This is accomplishedthrough the pointer ThisCollection which is added to each element by the SuperKernel.There is a bug in version 1.0 of pC++: ThisCollection can not be used in an inlined MethodOfElement function dened in the body of the collection, so we must dene sumFrom outsidethe collection C as follows.void C::sumFrom(int toRight){ElementType *ptr_neighbor;ptr_neighbor = (*ThisCollection)(Ident + toRight);(*this) += *ptr_neighbor;}There is another way to access an element that can be found by an oset from the currentposition.
The SuperKernel function Self can be used to achieve the same result by replacingthe statementptr_neighbor = (*ThisCollection)(Ident + toRight);with the equivalent expressionptr_neighbor = Self(toRight);The reduction function will run in each processors object thread and invoke parallelaction on the elements. In the rst step it will ask all the even indexed elements to fetch the29values from the odd elements and add these to their own values.
Next it will have all theelements that are multiples of 4 add the sums from the remaining even elements, etc. Theresult will leave to total in element 0.ElementType C::reduce1(){int i;int n = dim1size;ElementType E;for(i = 1; i < n; i = 2*i)this[0:n-1:2*i].sumFrom(i);}Note that we have used a parallel subset operation to refer to the elements of the collectionsthat are multiples of 2*i.
This operator uses a fortran 90 vector syntax[base :limit : stride ]to dene the range of elements in the subset. Note also that this operator is applied to aPOINTER to the collection.4.6.2 More on ThisCollection. Ver. 2.0A standard member of the MethodOfElement elds that are added to each element is thepointer which allows an element to refer the the collection (TEClass) objectfor which it is a local collection member. In this section we will illustrate how it can be usedto access local collection data members.The third way to write the summation program is to have each element add its valueinto a local collection variable. In this case we create an ElementType Object local_total(which we assume is initialized to a \zero" value) as a member eld in the Collection TEClassobject.ThisCollectionCollection D: SuperKernel{public:ElementType local_total;D(Distribution *d, Align *A);ElementType reduce2();MethodOfElement:virtual ElementType &operator +=(ElementType &);void accumulate();};The accumulation function adds each element value directly to local_total.30void D::accumulate(){D<ElementType> * ptr;ptr = (D<ElementType> *) ThisCollection;ptr->local_total += *this;}Note that because ThisCollection is inherited from the SuperKernel, we must cast it to thetype of our collection before we can access the local_total eld.The reduction function which runs in each TEClass processors/object thread will invokethe accumulate function on each element.
It is important to understand that, within aprocessors object thread, this action is completely serial, so there is no \race condition"involved with the update to local_total. We use a simple TEClass reduction function tocompute the total between threads as shown below.ElementType D::reduce2(){this->accumulate();return pCxx_sum(&local_total);}NOTE: for Version 1.0, the pCxx_sum() function is not yet part of the standard library.However, everything else described in this subsection will work in version 1.0.4.6.3 Working with the Local Collection.
Ver. 1.0+Each Processors object thread owns one TEClass representative object from the collectiondenition. The non-MethodOfElement member function are executed by the thread in pureMIMD style. Each such function has complete control of the "local collection" of elementsthat are mapped to that thread of computation. It is often the case that the easiest way tocarry out some type of processing on the collection is to have each thread carry out the tasksequentially on each element of its local collection.The forth way to program the summation of the elements in a collection is to have eachprocessor object thread compute the total of the elements in their own local collection andthen we will copy the result to a second collection dened as a distributed array of elementswith only one element per thread.Processors P;Distribution d(NumProcs(),&P,BLOCK);Align a(NumProcs(),"[ALIGN(V[i],T[i])]");DistributedArray< C > onePerThread(&d,&a);One of the functions of a distributed array is to provide basic array oriented reductionfunction.
In this case we will use the functionvoid ReduceDim1();31which reduces along the rst dimension so that the sums of rows of an array are reduced tocolumn 0. In our case, because the array is one dimensional, the sum will be left in element0.We begin with a rewrite of the reduce2() to take a pointer to the new distributed arrayas an argument. We will have each thread add every element of the local collection to thetotal.
The easiest way to say this is to use the function Is_Local() which is true if thenamed element is in the local collection of the thread that is executing the function. Asimple loop can be used to sweep though the collection as follows.for(int i = 0; i < dim1size; i++)if( Is_Local(i)) local_total += (*this)(i);The problem with this approach is that one must go through the entire collection to identifyonly the local subset. In some cases the overhead for this search is not serious. However,there is another way to do this. One may set a special buer to the head of the localcollection with a call to the ResetLocal() which takes a pointer to the collection as anargument.
Then the function FirstLocal() will extract the index of the rst element inthe local collection and the function Local() can be used to advance the index to the nextlocal element index. When Local() returns a negative value, we have exhausted the set oflocal elements. Using these functions our code can be written asElementType D::reduce3(DistributedArray<ElementType> *onePerThread){int loc;ResetLocal(this);for(int i = FirstLocal(); i >= 0; i = Local(i))local_total += (*this)(i);// now copy the total to the collection onePerThread and Reduce// that one.ResetLocal(onePerThread);if((i = FirstLocal()) >= 0)(*onePerThread)(i) = local_total;Barrier(); // make sure all the copies are complete.onePerThread->ReduceDim1();return *(*onePerThread(0));}We will include one more example of using the local collection.