supercomp93 (1158320), страница 5
Текст из файла (страница 5)
Thekey, we believe, is to keep the number of runtime sys-tem requirements small and to concentrate on ecientimplementations of required runtime system functions.The three main pC++ runtime system tasks arecollection class allocation, collection element access,and barrier synchronization. The implementation approach for these tasks is dierent for distributed memory than for shared memory architecture.In the case of the distributed memory machines,the critical factor for performance is the availabilityof low latency, high bandwidth communication primitives. (Note that we have not made use of the CM-5vector units or of highly optimized i860 code in thebenchmarks.) While we expect the performance ofthese communication layers to improve dramaticallyover the next few months, we also expect to makechanges in our compiler and runtime system.
One important optimization will be to use barriers as infrequently as possible. In addition, it will be importantto overlap more communication with computation.In the case of shared memory machines, the performance focus shifts to the memory system. Although the BBN TC2000 architecture was classiedas a shared memory architecture for this study, thenon-uniform times for accessing collection elements inthis machine result in runtime system performancecharacteristics similar to the distributed memory system. The more classic shared memory architectureof the Sequent Symmetry will require a closer studyof memory locality trade-os. Clearly, the choice ofwhere to allocate collections in the shared memorycan have important performance implications.
In ahierarchical shared memory system, such as the KSR1, the goal should be to allocate collection elements ina way that maximizes the chance of using the fastermemory closer to the processors and that minimizesthe possible contention and overhead in accessing remote memory. The problem for the runtime systembecomes what memory allocation attributes to chose.The default choice is not guaranteed to always be optimal. Future versions of shared memory runtime systems may use properties of the collection classes todetermine the appropriate element layout.References[1] BBN Advanced Computer Inc., Cambridge, MA.
Inside the TC2000, 1989.[2] D. Hensgen, R. Finkel, and U. Manber. Two Algorithms for Barrier Synchronization. Int'l. Journal ofParallel Programming, 17(1):1{17, 1988.[3] Sequent Computer Systems, Inc. Symmetry Multiprocessor Architecture Overview, 1992.[4] S. Frank, H. Burkhardt III, J. Rothnie, The KSR1:Bridging the Gap Between Shared Memory and MPPs,Proc.
Compcon'93, San Francisco, 1993, pp. 285{294.[5] T. von Eiken, D. Culler, S. Goldstein, K. Schauser,Active Messages: a Mechanism for Integrated Communication and Computation, Proc. 19th Int'l Symp. onComputer Architecture, Australia, May 1992.[6] D. Culler, T. von Eiken, CMAM - Introduction toCM-5 Active Message communication layer, man page,CMAM distribution.[7] A.
Chien and W. Dally. Concurrent Aggregates (CA),Proceedings of the Second ACM Sigplan Symposiumon Principles & Practice of Parallel Programming,Seattle, Washington, March, 1990.[8] High Performance Fortran Forum, Draft High Performance Fortran Language Specication, Version 1.0,1993. Available from titan.cs.rice.edu by ftp.[9] J. K. Lee, Object Oriented Parallel ProgrammingParadigms and Environments For Supercomputers,Ph.D.
Thesis, Indiana University, Bloomington, Indiana, June 1992.[10] J. K. Lee and D. Gannon, Object Oriented Parallel Programming: Experiments and Results Proc. Supercomputing 91, IEEE Computer Society and ACMSIGARCH, 1991, pp. 273{282.[11] D. Gannon and J. K. Lee, Object Oriented Parallelism: pC++ Ideas and Experiments Proc. 1991 JapanSociety for Parallel Processing, pp. 13-23.[12] D. Gannon, J.
K. Lee, On Using Object Oriented Parallel Programming to Build Distributed Algebraic Abstractions, Proc. CONPAR/VAPP, Lyon, Sept. 1992.[13] D. Gannon, Libraries and Tools for Object ParallelProgramming, Proc. CNRS-NSF Workshop on Environments and Tools For Parallel Scientic Computing,1992, St. Hilaire du Touvet, France. Elsevier, Advancesin Parallel Computing, Vol.6.[14] K. Li, Shared Virtual Memory on Loosely CoupledMultiprocessors, Ph.D.
Thesis, Yale University, 1986.[15] Z. Lahjomri and T. Priol, KOAN: a Shared Virtual Memory for the iPSC/2 Hypercube, Proc. CONPAR/VAPP, Lyon, Sept. 1992.[16] M. Lemke and D. Quinlan, a Parallel C++ ArrayClass Library for Architecture-Independent Development of Numerical Software, Proc. OON-SKI ObjectOriented Numerics Conf., pp.
268{269, Sun River,Oregon, April 1993.[17] J. Dongarra, R. Pozo, D. Walker, An Object OrientedDesign for High Performance Linear Algebra on Distributed Memory Architectures, Proc. OON-SKI Object Oriented Numerics Conf., pp. 257{264, Sun River,Oregon, April 1993..