ipps94 (Раздаточные материалы)
Описание файла
Файл "ipps94" внутри архива находится в следующих папках: Раздаточные материалы, SAGE. PDF-файл из архива "Раздаточные материалы", который расположен в категории "". Всё это находится в предмете "модели параллельных вычислений и dvm технология разработки параллельных программ" из 7 семестр, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст из PDF
To appear in: Proceedings of the 8th International Parallel Processing Symbosium (IPPS), Cancun, Mexico, April 1994.Performance Analysis of pC++: A Portable Data-ParallelProgramming System for Scalable Parallel Computers1A. Malony, B. MohrP. Beckman, D. Gannon, S. YangF. BodinDept. of Comp. and Info. Sci.Dept.
of Comp. Sci.IrisaUniversity of OregonIndiana UniversityUniversity of RennesEugene, Oregon 97403Bloomington, Indiana 47405Rennes, Francefmalony,mohrg@cs.uoregon.edufbeckman,gannon,yangg@cs.indiana.eduAbstractpC++ is a language extension to C++ designed toallow programmers to compose distributed data structures with parallel execution semantics. These datastructures are organized as \concurrent aggregate" collection classes which can be aligned and distributedover the memory hierarchy of a parallel machine in amanner consistent with the High Performance FortranForum (HPF) directives for Fortran 90. pC++ allowsthe user to write portable and ecient code which willrun on a wide range of scalable parallel computers.In this paper, we discuss the performance analysis of the pC++ programming system. We describethe performance tools developed and include scalabilitymeasurements for four benchmark programs: a \nearest neighbor" grid computation, a fast Poisson solver,and the \Embar" and \Sparse" codes from the NASsuite.
In addition to speedup numbers, we present adetailed analysis highlighting performance issues at thelanguage, runtime system, and target system levels.1 IntroductionThe introduction of a new parallel programmingsystem should include, in addition to a description ofthe language principles and operational paradigm, anevaluation of the performance one would expect usingthe system, as well as a detailed accounting of the performance issues that have evolved from the system'sdesign and implementation. However, as is often thecase, the important concerns of portability, usability,and, recently, scalability of a parallel programmingsystem tend to outweigh the equally important performance concerns when the system is released, leaving the mysteries of performance evaluation for usersto discover.
Certainly, the reasons for this situationare not hard to understand. The challenges of designing a language that supports a powerful parallel programming abstraction, developing a runtime system1 This research is supported by DARPA under Rome Labscontract AF 30602-92-C-0135.Francois.Bodin@irisa.frplatform that is truly portable across diverse targethardware and software architectures, and implementing non-trivial applications with the system creates alarge and complex software environment. Althoughthe performance of the language, runtime system, andtarget system implementations are, clearly, always ofconcern during design and development, the time andeort needed to explore the performance ramicationsof the initial versions of a parallel programming systemmay be dicult to justify if it delays system introduction.However, the performance evaluation of a parallelprogramming system can be facilitated by integratingperformance analysis support early in the system's design and development.
This might occur in severalways, including: identifying performance events of interest at thelanguage and runtime system levels; providing \hooks" for static and dynamic instrumentation; and dening execution abstractions that will be helpful when characterizing performance behavior.The notion of designing for performance analysis iswell-founded [22, 23], but until now has been rarelyapplied in the parallel language system domain.The performance evaluation issues associated withthe pC++ system are interesting because they addressseveral performance levels (language, runtime system,target architecture) and require a system-integratedperformance toolset to fully investigate.
Hence, inconcert with the pC++ system development, a performance analysis strategy has been formulated andis being implemented. As a result, the rst version ofthe compiler | a preprocessor which generates SingleProgram Multiple Data (SPMD) C++ code that runson the Thinking Machines CM-5, the Intel Paragon,the IBM SP-1, the BBN TC2000, KSR KSR-1, theSequent Symmetry, and on a homogeneous cluster ofUNIX workstations running PVM | is being introduced with integrated performance analysis capabilities and an extensive set of performance measurementsalready completed. These results are presented here.The pC++ language and runtime system are verybriey described in x2 1. The performance measurement environment that is integrated in the pC++ system is described in x3.
This environment is beingused to perform a more detailed analysis of performance factors at the language, runtime system, andapplication levels. In x4, we describe four benchmarkprograms that we use to illustrate the performanceissues associated with the pC++ language and runtime system implementation.
Total execution timeand speedup results are presented in x5. In x6, wepresent some of the detailed performance analysis results we have generated.2 A Very Brief Introduction to pC++The basic concept behind pC++ is the notion ofa distributed collection, which is a type of concurrentaggregate \container class" [6, 8]. More specically,a collection is a structured set of objects which aredistributed across the processing elements of the computer in a manner designed to be completely consistent with HPF Fortran. To accomplish this, pC++provides a very simple mechanism to build \collections of objects" from some base element class.
Member functions from this element class can be appliedto the entire collection (or a subset) in parallel. Thismechanism provides the user with a clean interface todata-parallel style operations by simply calling member functions of the base class. In addition, there isa mechanism for encapsulating SPMD style computation in a thread based computing model that is bothecient and completely portable.To help the programmer build collections, thepC++ language includes a library of standard collection classes that may be used (or subclassed). Thisincludes classes such as DistributedArray, DistributedMatrix, DistributedVector, and DistributedGrid.In its current form, pC++ is a very simple preprocessor that generates C++ code and machine independant calls to a portable runtime system.
Thisis accomplished by using the Sage++ restructuringtools [3]. Sage++ is an object-oriented compiler preprocessor toolkit. It provides the functions necessaryto read and restructure an internal representation ofthe pC++ program. After restucturing, the programis then \unparsed" back into C++ code, which can becompiled on the target architecture and linked with aruntime system specically designed for that machine.pC++ and its runtime system have been ported toseveral shared memory and distributed memory parallel systems, validating the system's goal of portability. The shared memory ports include the Sequent1 A companion paper, \Implementing a Parallel C++ Runtime System for Scalable Parallel Systems", discusses issues ofpC++ runtime system design and appeared in the Proceedingsof the Supercomputing '93 conference [17].Symmetry [5], the BBN TC2000 [1], and the KendallSquare Research KSR-1 [2].
The distributed memoryports include the Intel Paragon [20], the TMC CM5 [19], the IBM SP-1, and homogeneous clusters ofUNIX workstations with PVM [24]. Work on portingthe runtime system to the Cray T3D and Meiko CS-2is in progress. More details about the pC++ languageand runtime system can be found in [9, 10, 11, 12, 17].3 The pC++ Performance AnalysisEnvironmentThe pC++ integrated performance analysis environment is unique because it is was designed and implemented in concert with the pC++ language andruntime system. As a result of this tight coupling,the denition and analysis of performance factors isbased in language and runtime execution semantics.However, this capability also presents a challenge topC++ performance measurement since low-level performance instrumentation must be specied for capturing high-level execution abstractions, realized inperformance measurements, and, nally, \translated"back to the application/language level.
Presently themeasurement environment consists of a proling tool,a portable event trace capturing library, a source codeinstrumentor, and instrumented runtime system libraries. Analysis and visualization tools which useevent trace data are under development; some are reported here. This section describes various aspects ofthe pC++ performance analysis environment.3.1 Proling pC++ ProgramsIn general, a very valuable tool for program tuningis function proling. Simply, special instrumentationcode is inserted at all entry and exit points of eachfunction.