1994. Compiler Transformations for High-Perforamce Computing
Описание файла
PDF-файл из архива "1994. Compiler Transformations for High-Perforamce Computing", который расположен в категории "". Всё это находится в предмете "конструирование компиляторов" из 7 семестр, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст из PDF
CompilerTransformationsDAVID F. BACON,ComputerSUSANSc~ence Diviszon,In thelastprogramsthreehaveanalysisof scalarmemoryMostsurveywithProgrammerswishinga referencefor techniquesoptimizingcompilercomputertheof eachandProgramming;Subjectthebasedon theoptimizationformaximizeon trackingof theandvariouspurposetheparallelismpropertiesofimportanttypesof eachwritersoptimizationsdevelopedoptimization.Readersand basicprogramDescriptors:D.1.3programof parallelarchitecturestraneformation,explainarehowtoof its application.manually.Compilerhigh-levelsuch as C and Fortran.the performanceto be applied1.2.2 [Artificialrelylanguagesan exampleD.3.4 [Programmingoptimization;In contrast,processorsof theirof the optimizationtechnology.architectureCategoriessequentialunderstandingof the importantfor the detailsoverviewto enhancetheirparallelreducetransformationsanalysis.and giveto improvefor optimizingfor uniprocessorsusingthat94720transformationstechniques.andtransformationsWe describeif it is legal,of compilerprogramfor imperativefor bothin depth.Californiaoptimizationdata-flowvector,is a comprehensiveTransformationsmostanddependencetechniquesdetermineby thesuperscalar,looprestructuringcoverednumberlocalityusingThisa largequantitieshigh-performanceBerkeley,implemented.executedComputingAND OLIVER J.
SHARPof California,decadesbeenof instructionsarraysL. GRAHAM,Unwersbtynumberandfor High-PerformancethatStudentsto date,compilationsurveyor asan overviewofas a referenceforand as a bibliographicto be familiarreferencewithmoderntechniques.[ProgrammingLanguages]:surveycan perform,can obtaincan use thisare expectedIntelligence]:code can use thiscompilersTechniques]:ConcurrentProcessors—compilers;AutomaticProgramming-programtransformationGeneralAdditionalTerms:Languages,Key Wordsmultiprocessors,Performanceand Phrases:optimization,Compilation,parallelism,INTRODUCTIONOptimizingcompilershave become an essential componentof modern high-performance computersystems.In additiontotranslatingthe inputprograminto machine language,they analyzeit and apply varioustransformationsto reduce itsrunningtime or its size.This researchunder contracthas been sponsoredDABT63-92-C-O026,dependencesuperscalaranalysis,processors,locality,vectorizationAs optimizingcompilersbecome moreeffective,programmerscan become lessconcernedabout the details of the underlying machinearchitectureand can employhigher-level,moresuccinct,andmore intuitiveprogrammingconstructsand programorganizations.Simultaneously,hardwaredesignersare able toin part by the Defense AdvancedResearchProjectsAgencyby NSF grant CDA-8722788,and by an IBM ResidentStudy(DARPA)ProgramFellowship to David Bacon,Permissionto copy withoutfee all or part of this materialis grantedprovidedthat the copies are not madeor distributedfor direct commercialadvantage,the ACM copyrightnotice and the title of the publicationand its date appear,and notice is given that copyingis by permissionof the Associationfor ComputingMachinery.To copy otherwise,or to republish,requiresa fee and]orspecific permission.@ 019940360-0300/94/1200-0345$03.50ACMComputingSurveys,Vol.26,No.4, December1994346*DavidF.
Baconet al.CONTENTSINTRODUCTION1.SOURCE2TRANSFORMATIONLANGUAGEISSUES2 1 Correctness223ScopeTARGETARCHITECTURES31Characterizing32Model4COMPILER5DEPENDENCE6.7.PerformanceArchitecturesORGANIZATIONANALYSIS51Typesof Dependences52Representing5,3Loop5.4SubscriptDependencesDependenceAnalysisAnalysisTRANSFORMATIONS61Data-Flow-Based6.2LoopReorderingLoop6.3LoopRestructuring6.4LoopReplacement6.5Memory6.6Partial6.7Redundancy68ProcedureTransformationsTransformationsAccessTransformationsEvaluationEhmmatlonCallTransformationsTRANSFORMATIONSFORPARALLELMACHINES89.71Data72ExposingLayout73Computation74Communication75SIMD76VLIWTransformatlonsCoarse-GrainedParallelismPartltIOnmgOptimizationTransformationsTRANSFORMATION8.1Unified8.2SearchingFRAMEWORKSTransformationtheCOMPILERTransformationSpaceEVALUATION9.1Benchmarks9.2Code9.3Compiler94Instruction-LevelCharacterlstlcsEffectivenessParallehsmCONCLUSIONAPPENDIX:A.1A 2 VectorA3MACHINESuperscalarMODELSDLXDLXMultiprocessorsACKNOWLEDGMENTSREFERENCESemploydesignsthatyieldgreatlyimprovedperformancebecausethey needonly concernthemselveswiththe suitabilityof the designasa compilertarget,not withits suitabilityas a directprogrammerinterface.In this survey we describe transformations that optimizeprogramswritteninACMComputingSurveys,Vol.26,No.4, December1994imperativelanguagessuch as FortranandC for high-performancearchitecincludingsuperscalar,vector,tures,and variousclassesof multiprocessormachines.Mostof the transformationswe describe can be appliedto anyoftheAlgolfamilylanguages;many are applicabletofunctional,logic, distributed,and objectorientedlanguagesas well.
These otherlanguagesraise additionaloptimizationissuesthatspace does not permitusto cover in this survey.The referencesincludesome startingpoints for investigationof optimizationsfor LISPandfunctionallanguages[Appel1992; ClarkandPeyton-Jones1985;Kranzet al.1986], object-orientedlanguages[Chambers and Ungar1989], and the set-basedlanguageSETL[Freudenbergeret al.1983].We have also restrictedthe discussionto higher-leveltransformationsthat requiresome programanalysis.Thus weexclude peepholeoptimizationsand mostmachine-leveloptimizations.We use theterm optimizationas shorthandfor optimizingtransformation.Finally,because of the richnessof thetopic, we have not given a detaileddescriptionof intermediateprogramrepresentationsand analysistechniques.We make use of a numberof differentmachinemodels, all based on a hypothetical superscalarprocessorcalled S-DLX.The Appendixdetailsthe machinemodels, presentinga simplifiedarchitectureand instructionset that we use when weneed to discuss machinecode.
Whileallassemblylanguageexamplesare commented,the reader will need to refer tothe Armendixto understandsome detailsof the’ ~xamples(such as cycle counts).We assumea basic familiaritywith~ro~amcompilationissues. Readers un~am~liarwithprogramflow analysisorother basic compilationtechniquesmayconsult Aho et al. [1986].1. SOURCEAll ofsurveyLANGUAGEthe high-levelexamplesare writtenin a languagein thissimilarCompilerto Fortran90, with minor variationsandextensions;most examplesuse only features found in Fortran77. We have chosen to use Fortranbecause it is the defacto standardof the high-performanceengineeringandscientificcomputingcommunity.Fortranhas also been theinput languagefor a numberof researchprojects studyingparallelization[Allen etal.
1988a;Balasundaramet al. 1989;Polychronopouloset al. 1989]. It was chosen by these projectsnot only because ofits ubiquityamong the user community,but also because its lack of pointersandits staticmemorymodelmake it moreamenableto analysis.It is not yet clearwhateffectthe recentintroductionofpointersintoFortran90 willhave onoptimization.The optimizationswe have presentedare not specific to Fortran—infact, manycommercialcompilersuse the same intermediatelanguageand optimizerforboth C and Fortran.The presence of unrestrictedpointersin C can reduce opportunitiesfor optimizationbecauseit isimpossibleto determinewhich variablesmay be referencedby a pointer.The process of determiningwhich referencesmaypointto the same storagelocationsiscalled alias analysis[Banning1979; Choiet al.
1993; Cooper and Kennedy1989;Landi et al. 1993].The only changesto Fortranin ourexamplesare that arraysubscriptingisdenotedby squarebracketsto distinguish it from functioncalls; and we usedo all loops to indicatetextuallythat allthe iterationsof a loop may be executedconcurrently.To make the structureofthe computationmore explicit,we willgenerallyexpressloopsas iterationsrather than in Fortran90 array notation.We follow the Fortranconventionthatarraysare stored contiguouslyin memory in column-majorform.
If the array istraversedso as to visit consecutivelocations in the linear memory,the first (thatis, leftmost)subscriptvaries the fastest.In traversingthe two-dimensionalarraydeclaredas a[n, m], the followingarraylocationsarecontiguous:a[n – 1, 3],a[n, 31, a[l, 41, a[2, 41.Transformations347“Programmersunfamiliarwith Fortranshouldalso note that all argumentsarepassed by reference.When describingcompilationfor vectormachines,we sometimesuse array notation when the mappingto hardwareregisters is clear. For instance,the loopdoalli=l,64a[i] = a[i] + cend do allcould be implementedwith a scalar-vector add instruction,assuminga machinewith vector registersof length64.
Thiswouldbe writtenin Fortran90 arraynotationasa[l :64] = a[l :64] + cor in vectorasmachineassemblylanguageLFADDILVF2, c(R30);Ioad c into register F2R8, R30, #a ;Ioad addr. of a into R8;Ioad vector a[l :64] toVI, R8ADDSVVI,SvVI.V1F2, VIR8;add scalar to vector;store vector in a[l :64]Arraynotationwill not be used whenloop bounds are unknown,because thereis no longeran obviouscorrespondencebetweenthe source code and the fixedlength vector operationsthat performthecomputation.To use vectoroperations,the compilermust performthe transformationcalled strip-mining,which is discussed in Section 6.2.4.2. TRANSFORMATIONISSUESFor a compilerto apply an optimizationto a program,it must do three things:(1)Decide upon a part of the programtooptimizeand a particulartransformationto apply to it.(2) Verifythat the transformationeitherdoes not change the meaningof theprogramor changes it in a restrictedway that is acceptableto the user.(3)TransformIn thislast step:ACMComputmgthe program.surveywe concentrateon thetransformationof the program.Surveys,Vol.26,No4, December1994348●DavidF.