mardigras (1158311), страница 3
Текст из файла (страница 3)
The function pushParticle() uses the gravitationalforce to update the positions and velocities of the particles. The argument passed topushParticle() is a collection designed for the mesh data structure. The functionupdateGridMass() is used to add the mass of a particle to the total mass of the meshpoint to which it is closest. The importance of this function is that it rst loops throughthe particles local to a segment and accumulates a local total mass for each mesh point.It then adds the local total mass to the mesh point's total mass by a remote updateoperation on the appropriate mesh point. Because remote updates are expensive, thisoperation should be infrequent.
This is why the particles are sorted to form a list inthe beginning.During the sorting, the particles within each segment are sorted rst using the quicksort library sorting function. Then a global sorting is performed using the bitonic sortalgorithm [6].4.2 The Mesh CollectionThe mesh is logically a three dimensional array of mesh points each containing thevalue of density and position. Because FFT is used to solve the gravitational potentialequation, the data structure is designed as a two-dimensional collection with eachelement contains an array of mesh points that have the same x and y coordinates butdierent z coordinates:class MeshElement {public:double mass[nz], position[nz];};The collection Mesh is dened as follows:Collection Mesh : SuperKernel {public:Mesh(double lx, double ly, double lz, Distribution *T, Align *A);void computePotential();...};10where function computePotential() computes the gravitational potential using thetotal mass at each mesh point and lx, ly, lz are the numbers of mesh points in thethree x, y, and z dimensions, respectively.To calculate the gravitational potential (or forces), discrete Fourier transform isused twice: rst the density is transformed into the wavenumber domain and then thepotential (or force component) is transformed back into spatial domain.FFT transform in the x-, y-, and z-directions is performed by the Mesh collection.The FFT in the z-direction poses no problems, as for each MeshElement with indices(i,j) the whole array of mesh points with indices (i,j,*) is located in that singleMeshElement.
Each MeshElement performs the z -FFT on its data without inter-elementdata communication, and load balancing is also ensured.To perform FFT in the x-direction, the values of density at mesh points with indices(*,j,k) are collected by the collection element, Mesh(i,j). It then performs sequentialFFT and sends back the resulting potential values to the mesh points from which itcollected the density values.To ensure good load balancing, we want all collection elements to participate thex-FFT. If it is known that the number of mesh points in the x-direction is equal tothe number of mesh points in the z-direction, we can let all Mesh(*,j) send their kthdensity value to Mesh(k,j).
If the number of mesh points in the x-direction is notequal to the number of mesh points in the z-direction, we can let Mesh(*,j) send theirkth density value to Mesh(l,j) where l = (j + k) % lx. The y-direction FFT canbe performed in a similar fashion.4.3 The Main Simulation LoopIn its simplest form the simulation is given by the code below:main(){Processors P;Distribution DP(segment_count, &P, BLOCK);Align AP(segment_count, "[ALIGN(particlelist[i], DP[i])]");ParticleList<Segment> particlelist(&DP, AP);Distribution DM(x_count, y_count, &P, BLOCK, WHOLE);Align AM(x_count, y_count, "[ALIGN(mesh[i], DM[i])]");Mesh<MeshElement> mesh(&DP, AP);// initialize particle list...// main loopfor(int i = 0; i < number_of_steps; i++){mesh.computePotential();particlelist.pushParticle(mesh);if(needed) particlelist.sortParticles();particlelist.updateGridMass(mesh);}...}11The main loop involves parallel computation on both Mesh and ParticleList.
Firstthe potential is computed in parallel on the grid. Second, the particle velocities andpositions are updated. If particles have moved to new grid points, the appropriate datastructure updates must then be made. If the particles have moved a lot since the lasttime they were sorted, the particles are sorted again. Third, the particle masses areaccumulated in their corresponding points for the next iteration step.5 ADAPTIVE GRID AND MULTI-THREADEDEXECUTIONpC++ is designed to support parallel programming styles that are not well supportedby Fortran90 and HPF. Problems with complex, adaptive data structures are naturallysuited for an object-oriented language like C++.
Though it is true that Fortran90supports many advanced data-structure features, they are not as exible as those inC++ and have not been integrated into HPF.In this section, we describe the basic design of pC++ collection that will be usedin adaptive grid computations. The idea is to allow a mesh to be rened in regionswhere more resolution is needed, for example, near a congestion of galaxies or stars.We modify the Mesh collection element class so that a grid cell can represent a newner grid if the cell is in a region where extra resolution is required. The modiedMeshElement now called MeshCell is give below:class MeshCell{double gravPotential;double massDensity;int isRefined; // 1 if the cell is refinedMesh<MeshCell> *refinedRegion;};The resulting data structure, as illustrated for a two dimensional problem in Figure 1below, is a dynamical structure. Multi-threaded execution plays an important role inthis type of parallel programming.
Because new collections are created within elementsof a collection, we must be able to invoke parallel operations within parallel sections.This type of nested parallelism requires a multi-threaded model of execution.While it is still too early to evaluate this approach to parallel adaptive structures,we feel that the potential for success is high. The reader should note that there areseveral other approaches to adaptive parallel grid computation [17, 1, 7].6 CONCLUSIONIn this paper we have show how pC++ is used to design and implement object-orientedsoftware for N-body and adaptive mesh renement simulations. As a part of the NSFGC3 eort, pC++ research will focus on problems related to adaptive computationson parallel machines.
pC++ 1.0 is now available by anonymous ftp from IndianaUniversity (ftp.cica.indiana.edu in pub/sage). Currently, pC++ runs on the Thinking12Figure 1: Adaptive Mesh Renement. If more resolution is required, the grid cell isrened to form a new mesh structure.Machines CM-5, the Intel Paragon, the BBN TC2000, the Kendall Square ResearchKSR-1, the IBM SP-1, Parallel Virtual Machines (PVM) for workstation clusters, andthe Sequent Symmetry. A version for Cray T3D will also soon be available.pC++2.0 is currently under construction. The key new features in pC++2.0 aremulti-threads and global pointers. Multi-threaded execution is important for parallel adaptive structures. Global pointers are an essential requirement for the N-bodysimulations describe in this paper.References[1] S.
Bhatt, M. Chen, C-Y Lin, and P. Liu. Object-oriented support for adaptivemethods on parallel machines. In Object Oriented Numerics (OON-SKI), pages266{267, 1993.[2] F. Bodin, P. Beckman, D. Gannon, S. Yang, S. Kesavan, A. Malony, and B. Mohr.Implementing a parallel C++ runtime system for scalable parallel systems. InSupercomputing 93. IEEE Computer Society, 1993.[3] R. Chandra, A. Gupta, and J. Hennessy.
COOL: a Language for Parallel Programming, pages 126{148. The MIT Press, 1990.[4] K. M. Chandy and C. F. Kesselman. Compositional C++: Compositional parallel programming. In Fourth Workshop on Parallel Computing and Compilers.Springer-Verlag, 1992.[5] K. M. Chandy and C. F. Kesselman.
CC++: A declarative concurrent objectoriented programming notation. In Research Directions in Object Oriented Programming. MIT Press, 1993.13[6] T. H. Cormen, C. E. Leiserson, and R.L.Rivest. Introduction to Algorithms. MITPress, 1989.[7] W. Crutcheld and M. Welcome. Object oriented implementation of adaptive meshrenement algorithms. In Object Oriented Numerics (OON-SKI), pages 108{122,1993.[8] J. Dongarra, R. Pozzo, and D. Walker. An object oriented design for high performance linear algebra on distributed memory architectures.
In Object OrientedNumerics (OON-SKI), pages 257{264, 1993.[9] R. Ferrell and E. Bertschinger. Particle-mesh methods on the connection machine.International Journal of Modern Physics C, 1993.[10] D. Gannon, F. Bodin, P. Beckman, S. Yang, and S. Narayana. Distributed pC++:Basic ideas for an object parallel language. In Object Oriented Numerics (OONSKI), pages 1{24, 1993.[11] D. Gannon and J. K. Lee. On using object oriented parallel programming to builddistributed algebraic abstractions.
In Conpar-Vap, pages 769{774. Springer Verlag,1992.[12] D. Gannon, S. X. Yang, and P. Beckman. User Guide for a Portable Parallel C++Programming System, pC++. Computer Science Department, Indiana University,available from moose.cs.indiana.edu:/pub/sage by ftp, 1994.[13] A.