1994. Compiler Transformations for High-Perforamce Computing, страница 20
Описание файла
PDF-файл из архива "1994. Compiler Transformations for High-Perforamce Computing", который расположен в категории "". Всё это находится в предмете "конструирование компиляторов" из 7 семестр, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст 20 страницы из PDF
From thecompilerwriter’sperspective,the key issue is the time it takes to make a memory reference.Table A.4 summarizesthelatency of each kind of memoryreferencein sMX,Typically,shared-memorymultiprocessors implementa numberof synchronization operations.FORK(n) starts identicalcopies of the currentprogramrunningonn processors;because it must copy thestack of the forkingprocessor to a privatememoryfor each of the n – 1 other processors, it is a very expensiveoperation.JOIN( ) desynchronizeswith the previousFORK, and makes the processor available(except for thefor other FORK operationsoriginalforkingprocessor,whichproceeds serially).BARRIER( ) performsa barriersynchronization,which is a synchronizationpointin the programwhereeach processorwaits untilall processorshave arrivedatthat point.A.3.2Distributed-MemoryDLX MultiprocessordMXis ourhypotheticaldistributedmemorymultiprocessor,Themachineconsistsof 64 S-DLXprocessors(numbered O through63) connected in an 8 x 8mesh.
The networkbandwidthof eachlink in the mesh is 10 MB per second.Each processorhas an associatednetwork processorthat managescommunication; the networkprocessor has its ownpool of memoryand can communicatewithoutinvolvingthe CPU.Havingaseparateprocessormanagethe networkallowsapplicationsto send a messageasynchronouslyand continueexecutingwhile the message is sent. Messages thatTransformations411●pass througha processor en route to someother destinationin the mesh are handled by the networkprocessorwithoutinterruptingthe CPU.The latencyof a messagetransferisten microsecondsplus 1 microsecondper10 bytes of message (we have simplifiedthe machineby declaringthat all remoteprocessorshave the same latency).Communicationis supportedby a messagelibrarythat providesthe followingcalls:“ SEND(buffer, nbytes,target)Send nbytes from buffer to the processor target.
If the messagefits in thenetworkprocessor’smemory,thecall returnsafter the copy (1 microsecond/ 10 bytes).If not, the call blocksuntilthe messageis sent. Note thatthis raises the potentialfor deadlock,but we will not concern ourselveswiththat in this simplifiedmodel.BROADCAST(buffer,●nbytes)Send the message to every other processor in the machine.The communication libraryuses a fast broadcastingalgorithm,so the maximumlatencyisroughlytwice thatof sendinga message to the furthestedge of the mesh.* RECEIVE (buffer, nbytes)Wait for a message to arrive;when itdoes, up to nbytes of it will be put inthe buffer.The call returnsthe totalnumberof bytes in the incomingmessage; if thatvalueis greaterthannbytes, RECEIVE guaranteesthat subsequentcalls willreturnthe rest ofthat message first.ACKNOWLEDGMENTSWethankLam,MichaelandvariousofJamesLarus,forthisDemmel,DickAlexMuntz,AdvancedInstitute,theandcommentsDupuy,KenthankJohnStanley,DevelopmentearlierMonicaassistanceWeanonymousonGupta,theirsurvey.bers of the IBMvaluableManishSarkarpartsBoyland,JamesBurke,VivekrefereeswithJohnHauser,themem-Technologyfortheirdrafts.REFERENCESASELSON,ACMH.AND SUSSMAN, G.
J.and Interpretationof ComputerPress,Mass.ComputingCambridge,Surveys,Vol26,No.1985. StructurePrograms.4, DecemberMIT1994412David“ABU-SUFAH,W1979virtualmemoryRep78-945,ChampaignF. BaconImprovinget al.the performanceofcomputers.Ph D., thesis, Techof Illinoisat UrbanaUnivABU-SUFAEL W., Kuzw, D ,J., AND LAWRIE, D 1981On the performanceenhancementof pagingsystemsthroughprogramanalysisand transformationsIEEE(May), 341-356ACKRRiWMJ, WputerTransB. 1982.DataC-30, 5ComputflowlanguagesCom-A. V,, JOHNSON, S C..
AND ULLMAN, J D 1977Code generationfor expressionswith commonsubexpressionsJ ACM 24, 1 (Jan),146–160AV.,SETHI,Compder~.R,AND ULLMAN,Pnnccples,Addmon-Wesley,JTechruqucs,Reading,D. 1986and ToolsMassAIKDN, A. AND NICOLAU, A. 1988aOptimalloopparallehzatlon.In Proceechngs of the SIGPLANConference on Progrczmmlng Language Desgnand Implementutzon (Atlanta,Ga , June)SIGPLAN Not 23, 7 (July),308-317AIKEN, A. AND NICOLAU. A, 1988b.mg: A new loop parallelizationPerfectpipehntechnique.InProceedingsof the 2nd EuropeanSymposrumon ProgrammmgLectureNotes in Computer300Springer-Verlag,Berlin,Science,vol221-235.ALLEN,J, R,1983.Dependenceanalysisforsub-scriptedvariablesand Its apphcatlonto program transformations.Ph.D. thesis,ComputerScienceALLEN,Dept.,F.
E,Rice Umverslty,1969.ProgramHouston,optlmlzatlon.TexInAn-nual Rev~ew m AutomatwProgrammmg5. InternationalTractsin ComputerScienceandTechnologyand then- Apphcatlon.vol. 13 PergamonPress,Oxford,England,ALLEN, F. E. AND COCKE, J.optimizingtransformations.tmawatbontice-Hall,ALLEN,J.R.239–3071971. A catalogueofIn Design and 0p-of Compzlers,R Rustm,EdEnglewoodChffs, N. J., 1–30.AND KENNEDY,translationof FortranACMTrans.Program491-542,ALLEN, J R., KENNEDY, K., PORTERFIELD, C., ANDWARREN, J 1983 Conversionof controldependence to data dependenceIn ConferenceRecordof the 10th ACMSymposiumon PrinciplesofProgrammingLanguages(Austin,Tex., Jan,).ACM Press, New York, 177-189.Z%PERT.
D15, 2 (Feb ), 15-25AHo,AHo,Reductionof operatorstrength,In ProgramFlow Analysis:Theoryand ApphcatLons,S. S.Muchnikand N. D. Jones, Eds., Prentice-Hall,EnglewoodCliffs, N, J,, 79-101,K.1987,Pren-Automaticprogramsto vector form.Lang.Syst. 9, 4 (Oct.).ANDPentiumAVNON,D1993(June),Micro11-21AMERICAN NATIONAI,STANDARDS INSTITUTEAmericanNationalstandard,IEEEfor binaryfloating-pointarithmeticNot 22, 2 (Feb.), 9-25ANDERSON, J. M. AND LAM, M. Smlzatlonsforableparallelthe SIGPLAN19931987AnstandardSIGPLANGlobalopti-parallelismand locahtyon scalmachmes.InProceedz ngs ofConferenceon ProgrammingLanguageDesignandbuquerque,New Mexico,28, 6, 112-125ANDERSON,of the13, 3ArchitectureIEEEmicroprocessorS.
AND HUDAK,ImplementationJune)SIGPL4NP1990(AlNot.CompilationofHaskellarraycomprehensionsfor scientificcomputing.In Proceedingsof the SIGPLANConferenceon ProgrammingLanguageDesignand Implementation(White Plains, N.Y , June).SIGPLANNot 25, 6, 137-149APPEL,A. W.Cambridgeland.1992.CompdmgUmversltyARDEN, B. W., GALLER,1962. An algorithmpressions,J, ACMAKWND AND CULLER,wLthPress,Contmuatzons.Cambridge,Eng-B, A,, AND GRAHAM, R, M,for translatingBooleanex9, 2 (Apr,),D. E. 1986.222-239,Dataflowarchitec-turesIn AnnualReznew of ComputerSctence.Vol.
1. J. F, Traub,B. J. Grosz, B, W. Lampson,and N, J. Nilsson,Eds,Mto, Cahf,225-253AnnualRewews,Palo,&VZND,KAT13AIL, V , AND PINGALI,K.1980.Adataflowarchitecturewith tagged tokens. Tech.RepTM-174,Laboratoryfor ComputerScience, MassachusettsInstituteof Technology,Cambridge,Mass.ALLEN, J. R.
AND KENNEDY, K. 1984, AutomaticloopinterchangeIn Proceedingsof the SIGPLANonCompilerConstructionSymposium(Montreal,Quebec,June).SIGPJ5ANNot. 19,6, 233-246.BACON, D. F., CHOW, J.-H., Ju, D R, MUTHUKUMAR,K , AND SARKAR, V. 1994. A compilerframeworkfor restructuringdata declarationsto enhancecache and TLB effectiveness.In ProceedingsofCASCON’94 (Toronto,Ontario,Oct.).ALLEN, F, E., BURKE, M,, CHARLES, P., CYTRON, R.,AND FERRANTE, J.
1988a. An overwewof thePTRANanalysissystem for multiprocessing,J.ParallDistrib.Comput.5, 5 (Oct.),617-640.BAILEY, D. H, 1992, On the behaworof cache memories with stndeddata access, Tech. Rep. RNR92-015, NASA Ames ResearchCenter,MoffettField, Calif., (May),ALLEN, F. E,, BURKE, M., CYTRON, R., FERKANTE, J.,HSIEH, W., AND SARKAR, V. 1988b. A frameworkfor determininguseful parallelism.In Proceedings of the ACMInternationalConferenceonSupercomputu-zg(St. Malo, France.July). ACMPress, New York, 207–215.BAILEY,D.H.,BARSZCZ, E.,BARTON,J. T.,BROWNING, D.
S., CARTER, R. L., DAGUM, L,,FATOOHI, R. A., FR~DERICKSON, P. O., LASINSKZ,ALLEN,ACMF. E , COCKE,ComputmgSurveys,J.,AND KENNEDY,Vol.26.NoK.4, December1981,1994T. A.,SCHREIBER,R.S.,SIMON,H.D.,VENKATAKRISHNAN, V., AND WEERATUNGA, S. K.1991. The NAS parallelbenchmarks.Int.J.Supercomput.Appl,5, 3 (Fall), 63-73,CompilerBALASUNDARAM,usefulV. 1990.internalgrammingParall.154-170.J.BALASUNDARAM,tools:TheDtstrLb.V.,A mechanisminformationFox,DataparallelAccessComput.G.,for keepingin2KENNEDY,(June),K.,the effectsBROMLEY,of optimiza-tion on a procedurebody. In Proceedingsof theSIGPLANSymposiumon CompilerConstruction (Denver,Color., Aug.). SIGPLANNot. 14,8, 214-220.BANERJEE, U. 1991. Unimodulartransformationsofdoubleloops. In Aduancesin LanguagesandCompilersfor ParallelProcessing,A.