pcxx_ug (1158314), страница 12
Текст из файла (страница 12)
In the back solve, the values of theelements that are s units away are copied and are used to compute the local value.The nal poisson solver uses two collections F and U which represent the initial data andthe nal solution. Both are of type distributed tridiagonal of vector which is easily mappedto a one dimensional template. The main part of the program is shown below.54void FastPoissonSolve( DistributedTridiagonal<Vector> &U,DistributedTridiagonal<Vector> &F){Vector diag, offdiag;....F.sineTransform(U); // U <- sineTform(F);U.cyclicReduction(diag, offdiag);U.sineTransform(F); // F <- sineTform(U);U = F;}6.3 BM-3: The NAS Embarrassingly Parallel BechmarkThe NAS benchmark suite was designed to test the suitability of massively parallel systemsfor the needs of NASA Ames Research.
Of the nine codes in the suite, four have beentranslated to pC++ and we report on two of them here. (A more complete report on thissuite will be published when all have been translated and tested.)The easiest of these is the \Embarrassingly Parallel" program. This code generates 224complex pairs of uniform (0, 1) random numbers and gathers a small number of statistics.In the benchmark the random numbers are created by two special portable functions calledvranlc() and randlc(), and a main function compute pairs(int k) is used to compute thenumbers based on a dierent value of the seed parameter k.
The main part of the computationis a loop of the formfor(i = 1; i < nn; i++) compute_pairs(i);Our approach is to divide the work into a set of computational engines with one engine perprocessor. This \set" of Engines will be a collection of elements of type Set<GaussianEngine>.1Each GaussianEngine object will compute NODESof the total where NODES is the numberof processors.The Engine element class is shown below.class GaussianEngine{public:double q[nq]; // final result statistics.double randlc(double *x, double * a);voidvranlc(int n, double *x, double a,double *y);voidcompute_pairs(int kk);};To make a simple set collection we subclass DistributedArray.
The main function, computeGauss(), calls an element function do work() which does all the real computation. Thenal result statistics are kept in a local array q in each processor. A function sum results isused to do the reduction necessary to sum these values. This involves a simple tree reduction.55Collection Set: DistributedArray{public:void computeGauss(void);MethodOfElement:virtual double q[nq];voiddo_work(void);virtual voidcompute_pairs(int kk);void sum_results();};void Set::do_work(void){int k;for(k = index1+1; k <= nn; k = k+NODES){this->compute_pairs(k);}}The main routine computeGauss() consists only of computing the random numbers withdo_work() and then summing the resulting statistics with sum_results().6.4 BM-4: the NAS Sparse CG BenchmarkA far more interesting benchmark in the NAS suite is the random sparse conjugate gradientcomputation.
This benchmark requires the repeated solution to A*X = F, where A is arandom sparse matrix. There are two ways to approach the parallelization of this task.The best, and most dicult, is the matrix as the connectivity graph for the elements in thevector. By a preprocessing step one can partition the vector into segments of nearly equalsize that minimizes the communication between subsets.
The resulting algorithm can havethe same low communication to computation ratio as our simple grid CG example above.The second approach is not as ecient as the scheme that optimizes locality, but it amore direct implementation of the benchmark. We shall represent the matrix as a distributedrandom sparse matrix. More specically, we shall represent the matrix as a DistributedMatrix of Sparse Matrices.The DistributedMatrix collection class in the pC++ library allows any class that has thealgebraic structure of a ring to be the element of the matrix. Furthermore, a p by p matrixwhose elements are k by k matrices has all the same mathematical properties of a kp bykp matrix.
Indeed, this is the key property used in the blocking algorithms used in mostBLAS level 3 libraries. Consequently, we can select the element class to be a sparse matrixas shown below.class SparseMatrix{int rows, columns;56Figure 3: Alignment and Distributiondata_ptr * r; // a vector of pointers to row listsdata_ptr * c; // a vector of pointers to column listspublic:...void amux(Vector& Vout, Vector& Vin);...};is a simple double linked sparse matrix with a special function amux(Vout,that can be used to compute the sparse matrix multiply Vout = A*Vin. The collectionobject that is the core of the computation is thenSparseMatrixVin)DistributedMatrix< SparseMatrix >A(....
)which is illustrated in Figure 6.4.This distributed matrix of sparse matrices can then be used like an ordinary matrix.In particular, an ecient matrix multiply can be easily implemented that will multiply thedistributed matrix by a distributed vector.The benchmark kernel is then nearly identical to the Grid conjugate gradient algorithmfrom the rst benchmark.The primary communication consists of a broadcast of the Vin vector to each row of sparseblocks and then a reduction sum to compute the Vin results. The dot product reductionson the distributed vectors make up the remaining communication.The standard NAS version of this benchmark comes in a test size of n = 1400 and the fulltest of n = 14000. We have run the full size problem only on the CM-5.
All other statisticsreport the results for the small version.577 Known Bugs in Version 1.0The initial version of the pC++ parser still has a number of bugs. It is our hope that mostof these will be xed by the end of the rst quarter of 1994. We will list here the bugs thatwe know about. There probably are many more.
The operators { , .* and ->* } cannot be overloaded. Overloaded functions must have the same return type. an external declaration followed by a real one as inextern int c;int c;will result in a double declaration of the variable. Member functions that are declared as inline outside a class declaration will not beinlined unless they are also declared as inline within the class declaration. A member eld in a class may not be used before it is dened.
For example, thecompiler will complain about the following code.class A{int usei(){ return i;} .int i;};A related problem is with templates that references a member function or eld of aclass parameter. Static member initialization does not work. For example,struct ZZ{static int aa;};int ZZ::aa = 1; Using a symbolic parameter name more than once in a prototype will cause a parseerror.int doublename(int a, float a); A basic block which occurs as the rst thing in a function will not work.void foo() {{ int x,y;}} The C++ alternate constructor form for base types has been temporarily disabled.58int i = int(33);// use(int) 33; instead.fix coming soon.In addition to these errors, templates and collections cannot be used together in version1.0. Exception Handling is not supported on most platforms.598 The pC++ Kernel ClassThe pC++ Kernel class provides a number of functions. Most of these functions are low-levelfunctions which usually will not be used by pC++ users.
The denition for the class is givenin the following with functions and variables private to the class omitted:class Kernel {Kernel(Distribution *T, Align *A, int element_size);~Kernel();int Is_Local(int index);int Is_Local(int index1, int index2);int *Get_Element(int index);int *Get_Element(int index1, int index2 );int *Get_ElementPart(int index,int offset, int size);int *Get_ElementPart(int index1, int index2, int offset, int size);int Kernel::Get_ElementPart_Int(int index, int offset);float Kernel::Get_ElementPart_Float(int index, int offset);double Kernel::Get_ElementPart_Double(int index, int offset);int Kernel::Get_ElementPart_Int(int index1, int index2, int offset);float Kernel::Get_ElementPart_Float(int index1, int index2,int offset);double Kernel::Get_ElementPart_Double(int index1, int index2,int offset);int *Get_CopyElem(int index);int *Get_CopyElem(int index1, int index2 );};This describes the alpha version of the library.
Note that there are instances of one and twodimensional construtors and functions, but nothing beyond that. This will be xed in futurereleases. It is easy.Kernel(Distribution *T, Align *A, int element_size) is the constructor of theclass. Distribution and Align are two class objects that the constructor needs as arguments.\element size" is the size in bytes of an element class.