1994. Compiler Transformations for High-Perforamce Computing, страница 13
Описание файла
PDF-файл из архива "1994. Compiler Transformations for High-Perforamce Computing", который расположен в категории "". Всё это находится в предмете "конструирование компиляторов" из 7 семестр, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст 13 страницы из PDF
Additionally,move instructionsfor parameterscan beeliminated.To performthis optimizationonaprocedure f, registersmust first have beenallocatedforeach ofthe proceduresthatfcalls. This can be done by performingregisterallocationon proceduresorderedby adepth-firstpostordertraversalofthecall graph [Chow 1988].Forexample,in Figure43(c)maxusesonly R8–RIO.Figure44 shows foo aftercross-call registerallocation.R31 is savedin Rllinsteadofonthe stack, and thereturnis ajumpto the saved addressesin R1l. Additionally,R12 is used insteadof R16, allowingthe save and restoreofRI 6 to be eliminated.Only d remainsinthe stack frame.MOVRl,JRR31Elee:MOVRl,JRR8; max=xR9; max=y; return; returnR31(c) max after leaf optimizationFigure 43.Leafprocedurefoo:SUBIR30,#4;adjustMOVRll,R31;save(R2);R12=cLWR12,LWR8,R9,ADDSw(R30),MOVR2,ADDI R3,JALADDRI,R30,JRRllComputing; R8=bR9R3R30,Cross-callSurveys,SPretaddrR8 ; R9=d;saved;argi=addr(b)#0 ; arg2=addr;callmaxADDIFigure 44.ACM(R3)R12,optimization.RI,R12#4(d)max; Rl=e+c;restoreSP; returnregisterallocationVol.
26,No.4,for fOO.December1994388David*F. Baconet al.Parameter Promotion6.8.4Whenaparameterismax:passedence, the address calculationthe caller, but the load ofthevalueisdonebythean instruction,lationssincecanbehandledcallee.by referis done byparameterThiswastesmostaddresscalcuwiththeot~set(Rn)formatof load instructions.More importantly,if the operandis already in a registerin the caller, it mustbe spilled to memoryand reloadedby thecallee.
If the callee modifiesthe value, itmust then be stored. Upon returnto thecaller, if the compilercannot prove thatthe callee did not modifythe operand,itmust be loaded again. Thus, as many astwo unnecessaryloads and two unnecessary stores can be introduced.Anunmodifiedreferenceparametercan be passed by value,and a modifiedreferenceparametercan be passed byvalue-result.Figure45(a) shows max after this transformationhas been applied.Figure45(b)showsthe correspondingmodifiedform of foo.Since d can now be held in a register,thereis no longera need for a stackcan be apframe,so framecollapsingplied, and the stack frame is eliminated.In general,when the compilercan statically identifyall the callers of a leaf procedure, it can expand their stack framesto includeenoughspace for both procedures.The leaf proceduresimplyusesthe caller’sstack framewithoutdoingany new allocationof its own.Parameterpromotionis particularlyimportantfor languagessuch as Fortran,in which all argumentpassingis by reference.
However,the argument-passingsemanticsof value-resultare not thepassedby refex=same as for argumentsence, in particularin the event of aliasing.Interproceduralanalysismaybenecessary to determinethe legalityof thetransformation.6.8.5ProcedureInliningProcedureinlining(also known as procedureintegration)replacesa procedurecall with a copy of the body of the calledprocedure[Allenand Cocke 1971; BallACMComputmgSurveys,Vol26, No 4, December1994SGTR1O,R2,BEQZR1O,ElseMOVRI,JRR31Else:MOVRI,JRR3> y)<= yR2; max=xR3; nrax=y; return; returnR31(a)maxafter;RIO=(X;Xparameterpromotion:xandyarepassed by value in R2 and R3.foo:(b)MOVRll,R31; saveLWR12,(R2);R12=cLWR2,(R3)ADDR3,R12,maxRI , Rl,JRRI 1fooafterFigure 45.; R2=bR2;R3=d; callJALADDretaddrR12max;Rl=e+c; returnparameterParameterpromotionon maxpromotion.1979; Scheifler1977].
Each occurrenceofa formalparameteris replacedwithaversionof the correspondingactualparameter,modifiedto reflectthe callingconventionof the language.Renamingmay also be requiredfor the local variables of the inlinedprocedureif they conflict with the callingprocedure’svariablenames or if the procedureis inlinedmorethan once in the same caller.Inliningcan almostalwaysbe performed,exceptwhenthe procedureinquestionis recursive.Even recursiveroutines may benefit from a finite numberofirdiningsteps, particularlywhen a constantargumentallowsthe compilertodeterminethe numberof recursivecallsthat will be performed.For Fortranprograms,incompatiblecommon-blockusages betweencaller and callee can makeinliningmore complexand in practiceoften preventit.When a call is inlined,all the overheadfor the invocationis eliminated.The stackframes for the caller and callee are allocated together,and the transferof controlis eliminated.Thisis particularlyimportantfor the return(J R31 ), sincea jumpthrougha registermay incurahigherpipelinepenaltythan a jump to afixed address.Compilerdoi=i,endfoo:ncallf(a,i)dosubroutinef(x,dimensionx[jlmax:j)x[*I= x[jl+ creturn(a) originalcodea[ilendi=l,= a[ildon+ cFigure 46,afterProcedureR12,R2,(R3)ADDR3,R12,R2;R3=dR3;R1O=(X(R2)SGTR1O,R2,R1O,ElseMOVRI,JRet;R12=c; R2=b;X> y)<= y; max=xR2; “return”Rl,R3ADDRI,RI,3RR31Figure 47.all(b)LuBEQZ389wLWElse:MOVRet:do allTransformationstof; max=yR12;Rl=e+c; returnmax inlinedintofoo.inlininginliningAnotherreasonforirdiningis toimprovecompileranalysisandoptimization.In manycompilers,a loopcontaininga procedurecall cannotbeparallelizedbecauseits read-writebehavioris unknown.Afterthe call is inlined, the compilermay be able to proveloop independence,therebyallowingvectorizationor parallelization.Additionally,registerusage may be improved,constantspropagatedmoreaccurately,and moreredundantoperationseliminated.An alternativeto inliningis to performinterproceduralanalysis.The advantageof interproceduralanalysisis that it canbe applieduniformly,since it does notcause code expansionthe way inliningdoes.
However,interproceduralanalysiscanbecostlyandincreasesthecomplexityof the compiler.Inliningalso affectsthe instructioncachebehavioroftheprogram[McFarling1991]. The change can be favorable,because localityis improvedbyeliminatingthe transferof control.Onthe other hand, if a loop body is mademuch larger,it may no longer fit in thecacheandthereforecauseadditionalmemoryaccesses.
Furthermore,if theloop containsmultiplecalls to the sameprocedure,multiplecopies of the procedure will be loaded into the cache.The primarydisadvantageof inliningis that it increasescode size, in the worstcase exponentially.However,in practiceit is simple to control the size increase byselectiveapplicationof inlining(for example,to smallleaf procedures,or toproceduresthatare only calledfrom afew places).
The result can be a dramaticimprovementin executionspeed. Figure46 shows a source-levelexampleof inlining; Figure47 shows the assembleroutput after the proceduremax is inlinedinto fOO.Ignoringcache effects, assume the folto executethelowing:if tp is the timeentireexecuteprocedurejusttheandbodyis the numberof timesis the totalexecutiongram, thent, = n(tPis thetimesavedt~isthetimetonit is called, and Ttimeof the pro-of theprocedure,– t~)by inlining,andI=:is the fractionof the total run time savedby inlining.Some recent studies [Cooper et al.
1991;1992] have demonstratedthat realizingthe theoreticalbenefitsof inliningmayrequiresome modificationof the compiler, because inlinedcode has differentcharacteristicsthan human-writtencode.Peculiaritiesof a languageor a compiler may reduce the effectivenessof inliningor produceunexpectedresults.Cooper et al. examinedseveral commer-ACM Computmg Surveys, Vol 26, No 4, December1994390“DavidF. Baconet al.cial Fortrancompilersand uncoveredthefollowingpotentialproblems:(1)increased demandfor registers;(2) largernumbersof local variables,sometimesexceedinga compiler’slimit for the number of analyzablevariables;(3) loss ofinformationdue to the Fortranconvention that procedureparametersmay beassumed to be unaliased.6.8.6 Procedurecallf(a,subroutinerealf(x,integern,doi=l,x[i]n= x[i]p)p**p(a) originalcodeCloningcallF.2(a,F.2(x,Surveys,Vol26, No.
4, December1994n)x [*]integerndoi=l,nx [i]end= x [i]*x [i]do(b)Figure 48.doi=l,endafterProcedurecloningcloning.ncall6.8.7 Loop PushingLoop pushing(also calledloop embedding [Hall et al. 1991]) moves a loop nestfrom the caller to a cloned versionof thecalled procedure.If a compilerdoes notperformvectorizationor parallelizationacross procedurecalls directly,pushingis a less generalway of achievingasimilareffect.For example,pushingis done by theCMAXFortranpreprocessorfortheThinkingMachinesCM-5.CMAXconverts Fortran77 programsto Fortran90,attemptingto discoverdata-paralleloperationsintheprocess[SabotandWholey1993].Pushingnot onlyallowsthe parallelizationof the loop in Figure49, it alsoeliminatesthe overheadof all but one ofthe procedurecalls.If thereare otherstatementsin theloop,distributionis a prerequisitetopushing.In this case, however,the de-n)subroutinerealf(x,n)dosubroutineComputmgn,x [*Iend doProcedurecloning[Cooper et al.
1993] isa techniquefor improvingoptimizationacross procedurecall boundaries.The callsites of the procedurebeing cloned aredividedintogroups,and a specializedversionof the procedureis createdforeach group. The specializationoften provides opportunitiesfor betteroptimization,particularlyduetoimprovedconstantpropagation.In Figure48, cloningproceduref withp replacedby the constant2 allows reductioninstrength.Thereal-valuedexponentiationis replacedby a multiplication, which is usuallyat least 10 timesfaster.ACM2)n,f (a,reala[*]a[jl= a[jlj )+ creturn(a)calloriginalloopandprocedureF.2(x)subroutinerealdoF-2 (a)a[*]alli=l,a[i]endn= a[ildo+ callreturn(b)afterFigure 49.pendenceanalysisbe interprocedural.statementstionisis alwaysProcedurea differentinLoopforIfthelooppushingpushing.distributionthereloop,thearemustnoothertransforma-legal.inliningway(see Sectionofachieving6.8.5)averyCompilersimilareffect, and does not requireproceduralanalysis.inter-6.8.8 Tail Recursion EliminationTransformationslogicalrecursive6.8.9 Function MemorizationMemorizationis an optimizationthat isappliedto side-effectfreeprocedures(thatis, proceduresthat do not changethe state of the program,also called referentiallytransparent).In such cases it isfunctionInarray(a,x,i,n)real x, a[nlinteger i, ~If(i> n)theninarrayTail recursionis a particularlycommonform of recursion.A functionis recursiveif it invokesitself,directlyor indirectly.It is tail recursiveif its last act is to callitself and returnthe value of the recursive call withoutperformingany furtherprocessing.When a functionis tail recursive,it isunnecessaryto invokea separateinstancewithits own stack frame.Therecursioncan be eliminated;the currentinvocationwill not be using its frame anylonger,so the call can be replacedby ajump to the top of the procedure.Figure50 shows an exampleof a tail-recursivefunction(a) and the resultafter the recursion is eliminated(b).
A functionthatis not tail recursiveis shown in Figure50(c): it uses the resultof the recursivecall as an operandto the addition,sothere is computationthat must be performed after the recursivecall returns.Recursiveprogramscanbe transformedautomaticallyinto tail-recursiveversionsthat can be executediteratively[Burstalland Darlington1977], but thisis not commonlyperformedby existingcompilersfor imperativelanguages.Somelanguagespreventtail recursionelimination by requiringclean-upcode to be executed after a procedureis finished.Thesemanticsof C + +, for example,demand that before a procedurereturnsitmust call a deconstructoron each stackallocatedlocal object variable.Other languages,like Scheme [Rees etal.