pC++SciPro (1158313), страница 5
Текст из файла (страница 5)
This isbecause the forwardElimination() function modies the values of a and b. Becausethe collection does not know about types of constructors that are available for theElementType it must assume the existence of one.17The cyclic reduction function, shown below, divides the process into log (n) parallel stages of forward elimination followed by log (n) parallel stages of the backsolveprocedure.void DistributedTridiagonal::CyclicReduction(ElementType *diagonal,ElementType *offidagonal){int s;int n = this->dim1size-1;this->setCoeff(diagonal, offidaigonal);for(s = 1; s < n/2; s = s*2){(*this)[2*s:n-1:2*s].forwardElimation(n, s);}for(s = n/2; s >= 1; s = s/2){(*this)[s:n-1:2*s].backSolve(s);}}The forward elimination is an element function that access neighbor elements sposition in both directions.
Mathematically, it is equivalent to the eliminating Xi sand Xi+s from the middle equation inbXi 2s+ aXi s+ bXi= thisi sbXi s+ aXi+ bXi+s= thisibXi+ aXi+s+ bXi+2s = thisi+sTranslating this to pC++, we havevoid DistributedTridiagonal::forwardElimation(int n, int s){int k;ElementType *v1,*v2;ElementType c;if(s == 0){if(index == 0 || index == n)return;}*this = (ElementType) 0;v1 = (ElementType *) ThisCollection->Get_CopyElem(index1 -s);v2 = (ElementType *) ThisCollection->Get_CopyElem(index1 +s);c = (*b)/(*a);*a = *a -c*2*(*b);*b = -c*(*b);*this += -c*(*v1 + *v2);if (!ThisCollection->Is_Local(index1 -s)) delete v1;if (!ThisCollection->Is_Local(index1 +s)) delete v2;}18The backsolve step is equivalent to solving for Xi inbXi s + aXi + bXi+s = thisiXi s= thisi sXi+s= thisi+sand assigning the result to thisi .void DistributedTridiagonal::backSolve(int s){int k;ElementType *v1,*v2;v1 = (ElementType *) ThisCollection->Get_CopyElem(index1 -s);v2 = (ElementType *) ThisCollection->Get_CopyElem(index1 +s);*this = (*this - (*b)*( *v1 + *v2 ))/(*a)if (!ThisCollection->Is_Local(index1 -s))if (!ThisCollection->Is_Local(index1 +s))}delete v1;delete v2;It should be noted that, in general, this is not a very ecient algorithm because ofthe large cost of copying data from one processor to another.A more interesting tridiagonal solver would be a blocked scheme where each elementrepresents a set of rows of the system rather than a single row.
Numerous techniquesexist for this case, but we do not explore them here.6.2 A Fast Poisson SolverConsider the problem of solving the Poisson Equation on a square domain in twodimensions based on a \5-point", uniform grid, nite dierence scheme. In otherwords, we seek the solution to the linear system4Ui;j Ui 1;j Ui+1;j Ui;j 1 Ui;j +1 = Fi;jwith j; i in [1:::n 1], given n = 2k with boundary conditionsU0;i = Ui;0 = Ui;n = Un;ifor i in [0:::n].The algorithm we will use was rst described in terms of parallel computation bySameh and Kuck [15] consists of factoring the 5-point stencil, nite dierence operatorby rst applying a sin transform to each column of the data.
These transforms canall be applied in parallel. The result is a decoupled system of tridiagonal equations.Each tridiagonal equation involves only the data in one row of the grid. After solvingthis system of n 1 equations, a nal sin transform along each column completes thesolution.
(see [4] for the mathematical details.)The easiest way to do this is to view the array U as a distributed vector of columnvectors,DistributedVector< Vector >U(.....);19The rst and last steps of the parallel algorithm require a sine transforms operationon each column in parallel. Our Vector class has such a function and it can be appliedas followsU.sinTransform(wr,wi,p, &temp)where wr and wi are special arrays of size log (n) by n that must be initialized tohold the primitive nth roots of unity, p is a \bit reversal" permutation and temp is atemporary vector that is passed by the user so that the sin transform does not need tospend time allocating and deallocating the needed storage.The middle stage of the solution process requires the solution to n systems of tridiagonal equations where the right-hand-side data is given by the rows of the distributedarray.
In the example above, we illustrated the solution of a single distributed systemof tridiagonal equation using a collection of the formDistributedTridiagonal<Element>T(...);where Element was a class that represented one component of the solution. Allwe required of Element was that the standard numerical operators (+, -, *, /, =) werewell dened and that there was a zero assignment ( = (Element) 0). Consequently,the class Element could also be Vector and, in this case the function, cyclicReductionis solving a vector of tridiagonal equations in parallel.The main program program for the fast Poisson solver now takes the form shownbelow.float *wr[LOGMAXVECSIZE], *wi[LOGMAXVECSIZE];int p[MAXVECSIZE];main(){Template Temp(NBPROC,&P,BLOCK);AlignA(GRIDSIZE,"[ALIGN(F[i],T[i])]");DistibutedTridiagonal<Vector> U(&Temp,&A,MAXVECSIZE);initialize(&U);n = MAXVECSIZE;// initialize coeff arrays wr, wi and p for FFTslog2n = mylog2(n);initw(n/2,log2n-1,wr,wi,p);poisson_solve(n,wr,wi,p, &U);}The poisson solve function, shown below, need only initialize the coecient arraysfor the tridiagonal system.
These coecients correspond to the eigenvalues of tridiagonal system [ 1; 4; 1] which is one slice of the nite dierence operator. Hence, it iseasy to compute that the coecients for kth equation areak = 4 2 cos( kn )bk = 120voidpoisson_solve(int n, float **wr, float **wi, int *p,DistributedTridiagonal<Vector> *U){int k;Vector a(n);Vector b(n);Vector temp(n):for(k = 1; k < n; k++){a[k] = 4.0 -2.0* (float) cos((double) pi*k/n);b[k] = -1.0;}U->sinTransform(wr,wi,p, &temp);U->cyclicReduction(&a,&b);U->sinTransform(wr,wi,p, &temp);}7 ConclusionsThis paper presents the basics of an \object-parallel" programming language pC++.The focus has been on a complete discussion of the language extensions and the associated semantics as well as providing an introduction to the standard library of linearalgebra collection classes.We have not cited any performance results here. These are described in other papers[7], [9], [11], [13] and new results for various benchmarks will be available soon.
Thecompiler currently runs on any standard scalar unix system as well as the ThinkingMachines CM-5. A version for the Intel Paragon will be available by the middle of1993. A group led by Jenq Kuen Lee at NTHU in Tiwan has ported a version ofpC++ to the N-Cube. Other ports are planned and public releases of the software areplaned starting in July of 1993.The compiler takes the form of a C++ restructurer. This means the input languageis pC++ and the output is standard C++ in the form needed to run on multicomputersystems. The Kernel class contains most of the machine specic details, so buildingnew ports of the system is very easy.
The compiler is written using the Sage++compiler toolkit [12] which will also be distributed.We expect that pC++ will evolve. The current version does not allow collectionsof collections and there is no support for heterogenous, networked parallel systems.Because we feel that both are essential for future applications, we will move in thatdirection. This may take the form of making pC++ a superset of the CC++ dialectproposed by Chandy and Kesselman at Cal Tech.References[1] T. von Eiken, D.
Culler, S. Goldstein, K. Schauser, Active Messages: a Mechanismfor Integrated Communication and Computation, Proc. 19th Int'l Symposium onComputer Architecture, Australia, May 1992.21[2] D. Culler, T. von Eiken, CMAM - Introduction to CM-5 Active Message communication layer, man page, CMAM distribution.[3] A. Chien and W. Dally. Concurrent Aggregates (CA), Proceedings of the SecondACM Sigplan Symposium on Principles & Practice of Parallel Programming, Seattle, Washington, March, 1990.[4] R.
Hockney and C. Jesshope, Parallel Computers. Adam Hilger Ltd., Bristol, 1981.[5] High Performance Fortran Forum, Draft High Performance Fortran Language Specication, Version 0.4, Nov. 1992. Available from titan.cs.rice.edu by anonymous ftp.[6] J. K. Lee, Object Oriented Parallel Programming Paradigms and EnvironmentsFor Supercomputers, Ph.D.
Thesis, Indiana University, Bloomington, Indiana, June1992.[7] J. K. Lee and D. Gannon, Object Oriented Parallel Programming: Experiments andResults Proceedings of Supercomputing 91 (Albuquerque, Nov.), IEEE ComputerSociety and ACM SIGARCH, 1991, pp. 273{282.[8] K. Li, Shared Virtual Memory on Loosely Coupled Multiprocessors, Ph.D. Thesis,Yale University, Sept. 1986[9] D. Gannon and J.
K. Lee, Object Oriented Parallelism: pC++ Ideas and Experiments Proceedings of 1991 Japan Society for Parallel Processing, pp. 13-23.[10] D. Gannon, J. K. Lee, On Using Object Oriented Parallel Programming to BuildDistributed Algebraic Abstractions, Proceedings Conpar-Vap, Lyon, Sept. 1992.[11] D. Gannon, Libraries and Tools for Object Parallel Programming, Proceedings,CNRS-NSF Workshop on Environments and Tools For Parallel Scientic Computing, Sept., 1992 Saint Hilaire du Touvet, France.
To appear: Springer-Verlag,lectures on computer science.[12] F. Bodin, P. Beckman, D. Gannon, S. Narayana, S. Srinivas, Sage++: A ClassLibrary for Building Fortran 90 and C++ Restructuring Tools, Technical Report,Indiana University.[13] F. Bodin, D. Gannon, pC++ for Distributed Memory: CM-5 CommunicationPerformance Experiments Technical Report, Indiana University.[14] Z. Lahjomri and T. Priol, KOAN: a Shared Virtual Memory for the iPSC/2 hypercube CONPAR/VAP, Sept. 1992.[15] A. H. Sameh, S.
C. Chen, and D. J. Kuck, Parallel Poisson and BiharmonicSolvers, Computing 17 (1976), 219-23022.