pcxx_ug (1158314), страница 10
Текст из файла (страница 10)
Consequently, Fortran compilers can perform aggressive data-dependence-basedoptimizations which are not possible in C++. In addition, code generation technology andoptimized intrinsic library functions present in Fortran are much better than their C++counterparts. With the present generation of Fortran90 compilers, vector and array operations will be easy to compile into ecient code.
We believe that it is essential to support aportable programming environment that mixes Fortran with C++ across the entire range ofsupercomputing platforms. 3This point of view is also supported in the research community. The soon to be published conclusionsof the 1993 NSF/ARPA/DOE Grand Challenge Meeting include several strong recommendations for more344There is no fundamental problem in linking Fortran to C++ programs. C++ has astandard facility for representing interfaces to other languages. However, with the foundingof the High Performance Fortran Forum and the publication of the High Performance Fortran(HPF) extensions to Fortran90, we have a new problem.
In the following sections, we exploreand propose a mechanism for linking HPF programs to parallel C++ programs.5.4 Interfacing HPF and pC++Let us consider the ways in which this interface might operate: HPF calling C++/pC++:a. A direct call to a C++ function from the main thread of a Fortran program. Thispresents no real problems that did not exist before, unless an HPF distributed arrayis passed as an argument.b. A C++ function is called as an HPF extrinsic function in a parallelized loop. Againthis is only interesting if an HPF distributed array is passed as an argument.The HPF extrinsic procedure interface which allows calls to non-HPF subprograms asextrinsic procedures may well suce for the above two situations provided the C++subprogram is designed to be executed on a single processor.
In both cases we assumethat the C++ function will not attempt to return a value of any type other than abase type like int or double. It may be possible to view a returned C++ structure asa Fortran90 instance of a user dened type, but returning a C pointer would be a bigproblem. pC++ calling HPF:a. A call to a HPF parallel subroutine from a pC++ sequential main thread, i.e., apC++ main program.b. A call to a Fortran \node program", i.e., a sequential or vector subroutine that iscalled from a parallel section of a pC++ program.Of these we will focus on designing an interface for the case of pC++ calling Fortran. Thedata structure manipulation and encapsulation properties of C++ and the fast computingspeed of Fortran make it natural for Fortran subroutines to be called as library routines andfor C++ to act as the main controller.Let us rst consider case a. A pC++ element class which contains three arrays with twoof the arrays being a special type FArrayDouble and one being a conventional C++ arrayis given below.class MyElement {public:FArrayDouble x, y;double z[100];investigation of mixing object-oriented style with the more traditional programming styles.45MyElement();};The special type FArrayDouble is used to contain double precision Fortran90 arrays andtheir descriptor structures.
To dene the constructor for the class we assume that x is a onedimensional array of a size of 100 and y is a two dimensional array of a size of 200 50.MyElement::MyElement():x(100),y(200,50){...}To use this class in a pC++ program that calls an HPF subroutine, we need to dene theProcessors object and to specify the size and shape of the collection. In the followingexample, a one dimensional collection of a size of 64 with elements of type MyElement isconstructed and an HPF subroutine, FFUN, is called.extern "HPF" void FFUN(FArrayDouble&,FArrayDouble&,double*);main(){Processors P(64);Distribution D(64,&P,BLOCK);Align A(64,"[ALIGN(T[i],D[i])]");HPFCollection<MyElement> C(&D,&A);FFUN(C.x,C.y,C.z);...}The important point to consider is what the passed arguments look like in terms of HPFdistributed arrays.
In this case, they can be dened by the HPF directives.!HPF$ PROCESSORS P(64)double precision x(100*64),& y(200*64,50),z(100*64)!HPF$ DISTRIBUTE x(BLOCK),y(BLOCK,*) ONTO P!HPF$ DISTRIBUTE z(BLOCK) ONTO PIt is relatively easy to see how the blocked decomposition corresponds to our simple collectionof array blocks.
When we have a two dimensional array of virtual processors, we would havea pC++ program as below:main(){Processors P(64,32);Distribution D(64,32,&P,BLOCK,BLOCK);Align A(64,32,"[ALIGN(T[i][j],D[i][j])]");HPFCollection<MyElement> C(&D,&A);FFUN(C.x,C.y,C.z);...}46In this case, the HPF function would see these structures as dened by the following sequenceof directives.!HPF$ PROCESSORS P(64,32)double precision x(100*64,32),& y(200*64, 50*32),z(100*64,32)!HPF$ DISTRIBUTE x(BLOCK,BLOCK) ONTO P!HPF$ DISTRIBUTE y(BLOCK,BLOCK) ONTO P!HPF$ DISTRIBUTE z(BLOCK,BLOCK) ONTO PThe reader will note that we have not described how to create arrays that are distributedwith anything other than a BLOCK distribution and array sizes are restricted to be multiplesof the processor array size.
There are ways around these restrictions, but they will not bediscussed here.5.5 Working with Connection Machines node Fortran on theCM-5In this section we consider case b in which a Fortran node program is called from a parallelsection of pC++. One of the problems in experimenting with HPF compilers is that only afew are now beginning to emerge. At the time of this writing, we do not have access to anyof these systems, so we have conducted our experiments with Thinking Machines Co. CMFortran (CMF) on the CM-5. CMF is a reasonable approximation to the spirit of HPF.There are three major issues that we want to address: Fortran array memory allocation, Fortran array element access, and Fortran array interprocessor communication.As it can be seen later on in this section, the rst two are addressed through new C++classes, and the third through a new pC++ collection.A CM-5 processing node consists of a Sparc microprocessor and four vector processoraccelerators (vector units).
To make use of these vector units, arrays have to be allocated in the memories of the vector units. The Connection Machine Run Time System(CMRTS) provides functionalities for allocating space on the vector unit memories. However, the allocation is not so trivial since one also needs to be concerned about how array elements will be partitioned among the four vector units and being compatible withCMF.
We use CMRTS function \CMRT intern detailed geometry" to create a geometryand \CMRT allocate heap array" to allocate memory. Once the arrays are allocated, arraydescriptors returned by \CMRT allocate heap array" are passed to Fortran subroutines asarguments.
Computations are done in Fortran subroutines. These Fortran arrays can alsobe directly manipulated by pC++ control programs through overloaded C++ operators.We design a special C++ array class for each Fortran data type. The following classdenes an array class for Fortran double precision arrays:47class FArrayDouble {// private part of classpublic:double *d;FArrayDouble() {d = 0.0;};FArrayDouble(int i);FArrayDouble(int i,int j);FArrayDouble(int i,int j,int k);FArrayDouble(int i,char layout1);FArrayDouble(int i,int j,char layout1,char layout2);FArrayDouble(int i,int j,int k,char layout1,char layout2,char layout3);~FArrayDouble();double& operator()(int l);double& operator()(int l,int n);double& operator()(int l,int n,int m);};The private part of the class is used to store memory addresses of an array on the fourvector units. The information will be used for accessing each individual array elements inpC++, and, in general, should not be of concern to the users.
The class constructors thatrequire arguments allocate memory on vector units when invoked. The arguments, i, j, andk specify the dimension sizes of the array and layout1, layout2, and layout3 correspond tothe CMF compiler directive for array layouts. A layout can be either \SERIAL" or \NEWS."A \NEWS" dimension means that this dimension will be distributed across the four vectorunits.
A \SERIAL" dimension means this dimension will be packed into one vector unit'smemory. An array with only \SERIAL" dimensions signals that it should be allocated inthe Sparc chip's memory and thus is not allowed in the current implementation. Such anarray can be allocated just like a normal C++ array which is always allocated in the Sparcchip's memory. The constructors that do not require the layout information use the default\NEWS" layout. The overloaded operators are used for accessing array elements.The method we use to access array elements on the vector units from the Sparc chip differs from that of CMF. In the CMF assembly code, array element access involves computing\send addresses." In our approach, we use \subgrid dimension," \ochip position," \subgrid axis increment," and \subgrid size" provided by CMRTS to access each individual arrayelement.