pcxx_ug (1158314), страница 4
Текст из файла (страница 4)
However, there are now several waysin which one can write portable message passing software systems. PVM is one approachand others include Express, P4 and MPI, the new message passing standard. We view MPIas being still a bit limited in that there is no provision for communication between threadson the same processor, but we also expect it to evolve.In the short term, we are providing a simple mechanism to support a limited class ofcommunication operations in pC++ between threads of a processor object. In the longerterm we will integrate MPI and PVM support when we are sure we can nd implementationson all target systems.For now, there are three simple communication routines.
The rst is a way to broadcast data to each thread TEClass object member function from any other. The function,pCxx_BroadcastBytes() takes three arguments.12void pCxx_BroadcastBytes(int source, int length, void * buffer)The argument buffer a a pointer to a buer that exists in the private address space ofeach TEClass representative thread. The argument source is a ag which, if set to zero,determines the source of the buer that should be copied to each of the other buers. Onlyone thread may execute the function with the argument set to zero and all others mustexecute the function with a non-zero value.
All representative threads in a Processors objectmust execute this function if one of them executes it. For example,TEClass C{float x;void sendPi(){if(MyProc() == 0) x = 3.1415;// now send the value of x to each of the other threadspCxx_BroadcastBytes(MyProc(), sizeof(float), &(this->x));}The idea behind pCxx_BroadcastBytes() is to have a mechanism to share a value amongall thread that may have been computed by only one thread.
More direct communicationsbetween threads is achieved by a blocking send called pCxx_send() and a blocking receivepCxx_receive(). They are dened as follows1. void pCxx_send(int dest, int length, void *buffer) which send a message tothread dest of specied length in bytes that contained in the specied buffer. Thefunctions returns as soon as the buer is free to be used again.2.
int pCxx_receive(int *from, void *buffer) when called will wait until a messagehas been received by the executing thread. The value returned from the function is thelength, in bytes, of the message stored in buffer. The index of the sending thread isstored in from.As we do not yet recommend the use of these function, we will not provide examples.They will be part of the next release, but we expect that PVM or MPI will be the preferredmode of communication.3.2.3 Constructors for TEClass Objects. Ver. 2.0The examples from the previous sections, used only one type of constructor for TEClassObjects: one that that had a single Processors object parameter. This constructor is automatically generated by the compiler.
User dened constructors can be of any standardC++ form. THE COMPILER WILL MODIFY THESE SO THAT THE PROCESSORSPARAMETER IS ADDED AS THE LAST ARGUMENT. For example,TEClass A{int n;13float data[100];A(int size){n = size;for(int i = 0; i < size; i++) data[i] = 0.0;}};main(){Processors P;A vectors(64,P);A *x = new A(32, P); // another valid constructor.A *local = new A(32); // generates a local object. (not supported in v.1.0)}In this case the system binds the value 64 to the size argument in the users constructorand uses the value P to map the objects to processor threads.If a TEClass object is built without the processor parameter, as in the third declarationabove, a single representative object is created that is identical to a single instance underlyingclass.
(Not supported in version 1.0).3.2.4 Encapsulating SPMD Libraries. Ver. 2.0One of the reasons for including Thread Environment Classes in pC++ is to provide amechanism to encapsulate code that is designed for execution in a message-passing SPMDenvironment.
This includes many of the libraries that have been designed at the nationallaboratories such as Lapack++, AMR++, and many more.To understand how this works, consider an example of a matrix class Matrix dened asfollowsTEClass Matrix{double **data;public:int rows, cols;Matrix(n,m,p);void matMul(Matrix &A, Matrix &B);double &operator ()(int,int);};In an SPMD style execution, the matrix object would be created on each processorparticipating in the computation and the constructor, given global dimensions m by n wouldautomatically partition the data over the p processors. The interesting part of these librariesis the way processor communication is managed. In a typical application every processormust participate in each matrix operation done in parallel.
All communication is hidden14within the class operators and the resulting \user code" looks exactly like sequential code.(The version 1.0 pC++ compiler for distributed memory machines works exactly in thismanner.) Take for example the way the library designer would implement the operatormatMul(). Let us assume that the library is designed so that rows are partitioned over theprocessors. That is, rows (0, n/p -1) are on processor 0, rows (n/p, 2n/p-1) on processors 1,etc. The SPMD code for matMul() would look something like the following. Each processorhas part of three matrices, A, B and *this.
The code below rst broadcasts a column of Bto each thread which then assembles the pieces of the column and computes the appropriatedot product of that column with its share of the rows of A.void Matrix::matMul(Matrix &A, Matrix &B){int i,j,k,s, n, m, r, p, from;p = NumProc(); //NumProc() gives the number of processor threadsk = A.rows; m = cols, n = rows/p; r = k/p;double *rowbuf = new double[k];double *buffer = new double[r];for(i = 0; i < m; i++){// boadcast a column of B to each processorfor(j = 0; j < p; j++){for(s = 0; s < r; s++) buffer[s] = B.data[s][i];pCxx_send(j, r, buffer);}// assemble column blocks into a B rowfor(j = 0; j < p; j++){pCxx_receive(&from, buffer);for(s = 0; s < r; s++) rowbuf[from*r+s] = buffer[s];}for(j = 0; j < n; j++)for(s = 0; s < k; s++)data[i][j] += A.data[i][s]*rowbuf[s];}}This version of the program is not optimal (a blocked version should be used) but it is easyto understand and it is typical of the style of SPMD libraries.This function can now be called with a pC++ main program as followsProcessor_Main(){Processors P;Matrix C(n,m), A(n,k), B(k, m);....C.matMul(A, B);};15A more interesting problem is that of the element reference operator.
The job of the( int, int ) operator is to make sure that any read or update from the main thread ispropagated to the correct position in this distributed array. For example, if the main threadinvokesx= M(i,j);then the thread that contains the ( )th element must return the correct value to the mainthread. On the other hand, if we calli; jM(i,j) = x;then it is the job of the (...) operator to make sure the ( )th on the correct processor isupdated.
This problem is complicated because we are not sure which thread may be invokingthis operator. If it is a thread for which the requested data reference lies in the same addressspace, there is no problem. However, if they are dierent, such as when the main threadinvokes this operation on each worker, we have a problem. To see this diculty consider thefollowing possible implementation.i; jdouble dummy_buffer;double &Matrix:: operator( int i, int j){double *z;int not_local = 1;int p = NumProc();if (MyProc()*(rows/p) <= i) && (i < MyProc()*(rows/p)){// I have the desired row!z = &(data[ i % p][j]);not_local = 0;}else z = &dummy_buffer;pCxx_BroadcastBytes(non_local, sizeof(double), z);return *z;}If each of the worker threads associated with the TEClass executes this operation, then thereference evaluation will be correct when called by the main thread only if the main threadshares its address space with one of the worker threads.
(This is the case in the current ver.1.0 pC++), However, in the future versions, this may not hold. There are two solutions tothis problem. One solution is to introduce the CC++ global data type qualier, so thatspecial pointers and references can be created that can be passed between address spaces.We are strongly considering this for ver. 2.0. The other solution is to introduce more explicitmember functions for read and write operations.16TEClass Matrix{double **data;double &operator ()(int,int);public:int rows, cols;Matrix(n,m,p);void matMul(Matrix &A, Matrix &B);double read(int i, int j){ return (*this)(i,j); }void write(int i, int j, double value){ (*this)(i,j) = value; }};This solution protects the (...) operator to be only used in the TEClass thread environments and it allows only data values (instead of references or pointers) to be passed betweenaddress spaces.174 Collections.