pC++SciPro (1158313), страница 2
Текст из файла (страница 2)
The kernel also provides a global name space based on aninteger enumeration of the elements of each collection. In other words thekernel numbers each element of the collection and this number can be used byany element to identify and access any other element. The important elementaccess functions provided by the kernel will be described in greater detail later. A collection has private, protected and public data and member functionelds exactly as any other class. However, unlike a standard class, when acollection object is allocated a copy of the basic collection structure is allocated in the local memory of each processor. Consequently, the data elds arenot shared in the same way that element object indices are shared.
The methodfunctions in a collection are invoked in SPMD mode, i.e. they execute in anindependent thread on each processor. This is explained in more detail later. Because the type of the collection element is not specied when the collectionis dened, the special keyword ElementType is used to whenever it is necessaryto refer to the type of the element. A collection also has a set of data eld and member functions that are copiedinto each element as protected elds. These are labeled in the collection asMethodOfElement: elds. The purpose of MethodOfElement elds is to alloweach element to \inherit " properties of the collection and to dene operators5that act on the elements with knowledge of the global structure of collection.Fields or method functions within the MethodOfElement section that are denedas virtual are assumed to be overridden by a denition within the element class.2.3 Control Flow and ParallelismA collection is a set of elements which are objects from some C++ class.
The primary form of parallelism in pC++ is the application of a collection MethodOfElementfunction or an element class method to each element of the collection. In otherwords, let C be a collection class and E is a standard C++ class. If c is declaredto be of class C<E>, that is, c is a collection that has elements of class E , and f () is aMethodOfElement function of C or a method function of E , then the object parallelexpressionc.f();meansfor all e in c doe.f();The alignment and template distribution functions partition the collection and mapa subset of element to each processor. This subset is called the local collection.
If sufcient processor resources exist, all of these element function invocations in c.f() canhappen in parallel. If there are fewer processors than elements, then each processor willsequentially apply the method function to the subset of element in its local collection.Note that if f() returns a value of type T when applied to an element, the expressionc.f() will return a value of type C<T>. Likewise, if x is a public eld of E of type S ,the expression c.x is of type C<S>.Because all collections are derived from the Kernel class, they all inherit anindexing function and a subset operator. The expression c(i) returns a pointer to theith element if it is in the local collection of the processor evaluating the expression, orit returns a pointer to a buer which contains a copy of the ith element if it is not local.The expressionc[a:b:s].f();means concurrently apply f() to ei 2 c for i in the range [a; b] with step size s.As with C++, all pC++ programs begin execution with a single control thread atmain().
When a collection function is encountered, the control thread forks a separate thread onto each processor. These processor threads must be synchronized beforereturning from the collection function where the main control thread continues.For element class functions and MethodOfElement functions this synchronization isautomatic. However, for public, private and protected functions of the collection,the processor threads are said to be executing in asynchronous, Single Program, Multiple Data, SPMD, mode. This means all variables visible within the scope of the function are private to the processorthread and the programmer is responsible for the synchronization between processor threadsat the end of the function.6In the next subsection we rst give a simple example to illustrate all of the pointsdescribed above.
The sections that follow explore dierent aspects of pC++ in moredetail.2.4 Hello WorldTo demonstrate the basic extensions to C++ provided by pC++ we consider a simpleexample consisting of a set of elements that each print a simple \Hello World" messageto the standard output stream. We will build a simple linear set collection with oneconstructor method and a private eld size. Our linear set will add a eld to eachelement called myindex and add a special function to the element class sayHello()which will print the value of myindex and call a work routine doWork() from theelement.
The denition of the collection is given below.Collection LinearSet: public Kernel{int size;public:LinearSet(Template *T, Align *A) :Kernel(T, A, sizeof(ElementType)){int i;size = T->dim1size;for (i = 0; i < size; i++)if (this->Is_Local(i)) (*this)(i)->myindex = i;Barrier();};MethodOfElement:int myindex;virtual void doWork( void );void sayHello(){printf("Hello World from %d\n", myindex);doWork();};};We may use any C++ class as the element type of the collection so long asdoWork(), which is declared as virtual in the collection denition, is present.
Forexample,class MyElement{public:float x, y;MyElement(float xinit){x = xinit;};void doWork(){y = x*x;};MyElement & operator +(MyElement & Rhs);MyElement & bar(int i);7};A main program which will create a collection of this type and do a parallelinvocation of each of the sayHello function is given below.#include "kernel.h"#include "distarray.h"#define SETSIZE 5main(){Processors P;Template myTmplate(SETSIZE, &P, BLOCK);Align myAlign(SETSIZE, "[ALIGN( domain[i], myTmplate[i])]" );LinearSet<MyElement> G(&myTemplate, &myAlign, 4.0);G.sayHello();}The result of this computation will be ve "Hello World from .." messages and eachelement object will set its y variable to 4.0 .To see how this works, we describe the behavior line by line.
main() starts a single,logical thread of control, initializing the template and alignment objects.The constructor for the LinearSet is then called. The rst task of the constructor isto call the kernel constructor to initialize the collection. Notice that the constructorwas invoked in the main program with the additional argument, 4.0. Additional constructor arguments are passed to the element constructor which in this case has oneparameter, a float. The constructor initializes the size eld on each processor's localcopy of the collection structure with the value extracted from the template. It theninitializes the myindex eld in each element of the collection.LinearSet(Template *T, Align *A) : Kernel(T, A, sizeof(ElementType)){int i;size = T->dim1size;for (i = 0; i < size; i++)if (this->Is_Local(i)) (*this)(i)->myindex = i;Barrier();};To do this, two special Kernel functions are needed: Is Local(index) and Barrier().
Assoon as the main control thread enters a public function from a collection the executionmodel changes: each processor now operates on its own local portion of the collectionin SPMD mode. The user may view this as if the original control thread had splitinto a number of independent threads, one per processor. These processor threadsmust always be synchronized at the end of the collection function before the return tothe main control thread. The Is_Local(index) predicate returns true if the namedelement is in the local memory associated with this thread of the computation.Another idea borrowed from HPFF Fortran is the \owner computes" rule. pC++requires that only the \owner," i.e. the local processor thread, of an element maymodify elds or invoke functions that modify the elements state.8The function (*this)(i) returns a pointer to the local element with global name i.If the element accessed by (*this)(i) is not local the function returns a pointer to abuer containing a copy of the ith element.
3The Barrier() function causes each thread to wait until all others reach this pointin the program. Upon return from the collection constructor operation we return tothe single main thread of execution.The nal line of the program isG.sayHello();This invokes sayHello() in object parallel mode on all elements of G. The functionis applied to each element in an unspecied order. Operationally, each processor invokesthe function sequentially on that portion of the collection that is local.In addition to printing the \hello from ..." message, each element function alsocalls doWork(), which modies the eld y in each element. Each parallel operationis implicitly barrier synchronized, i.e.
all processors wait until "sayHello" has beeninvoked for all elements of G.Exploring this example collection a bit further, we note that because the membervalues x and y in the element class MyElement, of the collection G, are public, one canwriteG.x = G.x + G.yThis is a parallel addition for each element of the collection.
Furthermore, we havealso overloaded the plus (+) operator within the element, so the expression G + Gdenes a new collection of the same type. Also the function bar returns a referenceto an object of class MyElement. Consequently, the expression(G.bar() + G).sayHello()is valid. In these cases of implicit collections, the distribution and alignment are inherited from the leftmost collection in each subexpression.2.5 Building Abstract CollectionsLibrary collections are designed to be applied to many dierent types of elements. Forexample, matrix multiplication can be written without referring explicitly to the typeof the element.
This is accomplished in pC++ by using the ElementType keywordas a place-holder for a class that can be supplied later. However it is often the casethat a MethodOfElement function dened within a collection must refer to someproperty of the element in order to complete its task. The key word virtual is used toindicate methods of the element that collection requires. Consider the pC++ librarycollection class \Distributed Block Vector". This collection is used for blockedvector operations and it is usually used with an element that is vector type object.