supercomp91 (1158319), страница 6
Текст из файла (страница 6)
It takes an argument stride to know whatelement it will be combined with. For example, combine(j) will result in the combining of current ProfitVector and self(j). Notice that self() is treated like apointer (in the C sense) into our distributedarray andthat we can add osets to access the eld variables ofsibling ProtVector in the collection.The combining process can be represented as a tree.Each time after the combining the number of activeProtVector is reduced to half of the original number.The new prot vector is always stored in the ProfitVector with lower index number in the distributedarray collection S.
The procedure is repeated until thenal prot vector is constructed. Finally, we show13M=16k, Capacity=400p=1 p=2 p=4 p=8 p=16Alliant FX/8time(seconds) 99.23 50.09 26.23speed-up11.98 3.786.39 3.30 1.79speed-up11.97 3.81 7.02time(seconds) 125.84 63.85 33.36 18.58speed-up11.97 3.77 6.77Alliant FX/2800 time(seconds) 12.58BBN GP100012.1510.35Table 8: 0/1 Knapsack Problemof the collection will be a state in the state space tree.The state element is shown below:thing to notice is that the change of queue implementation in the future will not change user's program,since the collection abstraction hides the low level details.class state ElementTypeOf DistPriorityQueue{int lower_bound,rank, reduce_matrix[N][N];public:int cost(){return lower_bound;};int priority() {cost();}expand(DistPriorityQueue *Space);};6 ConclusionIn this papers, we have presented an objectoriented, parallel programming paradigm, called thedistributed collection model and an experimental language PC++ based on the model.
In the distributedcollection model, programmers can describe the datadistribution of elements among processors to utilizememory locality and a collection construct is employedto build distributed structures. The model also supports the express of massive parallelism and a newmechanism for building hierarchies of abstractions.We have also described our experiences with application programs and the performance results in thePC++ programming environments.There are still many challenges remaining in compiler optimization and runtime support for the distributed collection model and PC++ language. Theproblems include optimizing the cost of the accessfunctions for distributed collections, automatically replacing the basic element in the collection by a matrixblock of elements, and the choice between locality andrandomization. We are investigating these issues.
Inaddition to that, we are building a rich set of abstractions of distributed data structures as libraries. Thiswill be an important step for a better parallel programming environment for users as well as for us tounderstand the characteristics and necessary primitivefunctions of the model.The priority() function above is a function to overload the virtual function priority() described as an abstraction in the DistPriorityQueue to decide the priority of elements in the collection. The expand() methodwill select a splitter and expand the current state intotwo states, each with smaller subsets of tours.
If anyof the new states is a complete solution, the expandfunction will compare it with the current best solution and save it if it is better. If the new states arenot complete solutions, it will be stored in the Spacecollection.We will create two collections, Space and Working Queue. Every time we move a certain numberof nodes with lower cost from Space queue to Working Queue. Then we expand all the state nodes of theWorking Queue parallel by invoking the expand() operator of the state.
The program will stop when iteventually identify an optimal tour.The experimental result on Alliant FX/8, AlliantFX/2800, and BBN GP1000 is shown in the Table 9.Let's assume that Texpand be the average time neededto compute the LMSK heuristic of the generated nodesin each iteration and Taccess be the average time spentin accessing the DistPriorityQueue per node expansion(storing new nodes). Then the speed up is fairlylinear for small number of processors, but saturatesat Texpand =Taccess. Our current implementation forDistributedPriorityQueue is still very preliminary, sothe overhead in the access of the queue is relativelarge.
The saturation appears when processor numberis more than 8. In the future, we plan to improve itbased on high-performance concurrent queues[9]. OneReferences[1]14William L. Bain Indexed, Global Objects forDistributed Memory Parallel Architectures,Number Of City = 25p=1 p=2 p=4 p=8 p=12Alliant FX/8time(seconds) 60.04 32.61 18.55speed-up11.843.231.16 0.96speed-up11.73 2.83 3.40time(seconds) 81.26 45.72 28.60 20.75speed-up11.77 2.84 3.92Alliant FX/2800 time(seconds) 3.29 1.90BBN GP10001.083.0221.683.74Table 9: Traveling Salesman Problem with 25 Cities[2][3][4][5]Proceedings of the ACM SIGPLAN Workshop on Object-Based Concurrent Programming, pp.
95-98, Sigplan Notices, Volume 24, Number 4, April 1989.Andrew A. Chien and William J. Dally.Concurrent Aggregates (CA), Proceedingsof the Second ACM Sigplan Symposium onPrinciples & Practice of Parallel Programming, Seattle, Washington, March, 1990.Nicholas Carriero and David Gelernter.Linda in Context, Communications of theACM, Vol. 32, No. 4, April, 1989William Dally. A VLSI Architecture forConcurrent Data Structures, Kluwer Academic Publishers, Massachusetts, 1987Charles Koelbel, Piyush Mehrotra, JohnVan Rosendale. Supporting Shared Data[10][11][12][7][8][9]ings of the 1988 National Conference on Articial Intelligence(AAAI-88), 1988.Jong Lee, Eugene Shragowitz, and SartajSahni. A Hypercube Algorithm for the 0/1Knapsack Problem, Proceedings of the 1987International Conference on Parallel Processing, pp.
699-706, 1987.Marc Shapiro, Yvon Gourhant, SabineHabert, Laurence Mosseri, Michel Run,and Celine Valot SOS: An Object-OrientedOperating System - Assessment and Perspectives, Computing Systems, Vol. 2, No4, pp. 287-337, Fall 1989.Structures on Distributed Memory Architectures, Technical Report No. 90-7, Institute[6]V. Kumar, K.
Ramesh, and V. N. Rao.Parallel Best First Search of State SpaceGraphs: A Summary of Results, Proceed-for Computer Applications in Science andEngineering, January 1990.J. T. Kuehn and Burton Smith. The Hori-zon Supercomputer System: Architectureand Software, Proceedings of supercomput-ing '88, Orlando, Florida, November 1988.Bjarne Stroustrup. The C++ programmingLanguage, Addison Wesley, Reading, MA,1986J. D. C. Little, K. Murty, D. Sweeney, andC. Karel. An Algorithm for the TravelingSalesman Problem, Operations Research,No 6, 1963.V. N. Rao and V.
Kumar. Concurrent Access of Priority Queues, IEEE Transactionson Computers, Vol 37, No 12, Dec 1988.15.