1994. Compiler Transformations for High-Perforamce Computing, страница 5
Описание файла
PDF-файл из архива "1994. Compiler Transformations for High-Perforamce Computing", который расположен в категории "". Всё это находится в предмете "конструирование компиляторов" из 7 семестр, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст 5 страницы из PDF
1990], multidimensionalGCD[Banerjee1988a],and the powertest[Wolfe and Tseng 1992]. The omega test[Pugh1992] uses a linearprogrammingalgorithmto solve the dependenceequations. The SUIF compilerproject at Stanford has had success applyinga series ofexact tests,startingwiththe cheaperones [Maydanet al.
1991]. They use ageneralalgorithm(Fourier-Motzkinvariableelimination[DantzigandEaves1974]) as a backup.One advantageof the multiple-dimension exact tests is their abilityto handlecoupled subscriptexpressions[Goff et al.1991]. Two expressionsare coupled if thesame variableappearsin both of them.Figure6 shows a loop that has no dependencebetweeniterations.Thestatement reads the values to the right of thediagonaland updates the diagonal.A testthat only examinesone dimensionof thesubscriptexpressionsat a time, however,will not detect the independencebecausethere are many pairs of iterationswherethe expressioni in one is equal to theexpressioni + 1 in the other. The moregeneralexact tests woulddiscoverthatthe iterationsare independentdespiteACMComputingSurveys,Vol26, No4, December1994do i= 1,a[i, i]end doFigure 6.n-l= a[i,i+l]Coupled subscripts.ofcoupledsubscriptthepresenceexpressions.Section 9.2 discusses a numberof studiesthatexaminethesubscriptexpressionsin scientificapplicationsandevaluatethe effectivenessof differentdependencetests.6.
TRANSFORMATIONSThissectioncatalogsgeneral-purposeprogramtransformations;those transformationsthat are only applicableon parallel machinesare discussed in Section 7.The primaryemphasisis on loops, sincethat is generallywhere most of the executiontime is spent. For each transformation,we providean example,discussthe benefitsand shortcomings,identifyany variants,and providecitations.A majorgoal of optimizingcompilersfor high-performancearchitecturesis todiscover and exploitparallelismin loops.We will indicatewhen a loop can be executed in parallelby using a do all loopinsteadof a do loop. The iterationsof ado all loop can be executed in any order,or all at once.A standardreferenceon compilersingeneralis the “RedDragon”book, towhich we refer for some of the most common examples[Aho et al.
1986]. We drawalso on previoussummaries[AllenandCocke 1971; Kuck 1977; Padua and Wolfe1986; Rau and Fisher 1993; Wolfe 1989b].Because some transformationswere already familiarto programmerswho applied them manually,often we cite onlytheworkof researcherswhohavesystematizedand automatedthe implementationof these transformations.Additionally,we omit citationsto works thatare restrictedto basic blocks when globaloptimizationtechniquesexist.
Even so,the origin of some optimizationsis murky.For instance,Ershov’sALPHAcompilerCompiler[Ershov1966] performedinterproceduralconstantpropagation,albeit in a limitedform, in 1964!Transformationsdoi=l,na[i]= a[i]o+ c*iend do(a) original6.1Data-Flow-BasedT=cdoi=l,na[i]= a[i]T=T+c+ Tend doFigure 7.Loop-BasedloopLoop TransformationsA numberof classicalloop optimizationsare based on data-flowanalysis,whichtracksthe flow of data througha program’svariables[Muchnickand Jones1981]. These optimizationsare summarized by Aho et al. [1986].6.1.1359(b) afterstrengthreductionStrengthreductionexample,Strength ReductionReductionin strengthreplacesan expressionin a loop with one that is equivalent but uses a less expensiveoperator[Allen1969; Allenet al. 1981].
Figure7(a) shows a loop that containsa multiplication.Figure7(b) is a transformedversion of the loop where the multiplication has been replacedby an addition.Table 1 shows how strengthreductioncan be applied to a numberof operations.The operationin the first columnis assumed to appear withina loop that iterates over i from 1 to n. When the loop istransformed,the compilerinitializesatemporaryvariableT with the expressionin thesecondcolumn,Theoperationwithinthe loop is replaced by the expression in the third column,and the value ofT is updatedeach iterationwith the valuein the fourth.A variablewhose value is derived fromthe numberof iterationsthat have beenexecuted by an enclosingloop is called aninductionvariable.The loop control variable of a do statementis the most common kind of inductionvariable,but otheralsobeinductionvariablesmayvariables.The most commonuse of strengthreduction,often implementedas a specialcase, is strengthreductionof inductionvariableexpressions[Ahoet al.
1986;Allen1969; Allen et al. 1981; Cocke andSchwartz1970].Strengthreductioncan be appliedtoproductsinvolvinginductionvariablesbyconvertingthemto referencesto anequivalentrunningsum,as showninTable 1. Strength Reductions (the variable c ISloop invariant; x may vary between Iterations)ExpressionInitializationUseUpdatecxic’(-1)’z/cT=cT=cT=–1T = ~/CTTTXXTT=T+cT=TxcT=–TFigure8(a–c). This special case is mostimportanton architecturesin which integer multiplyoperationstake more cyclesthanintegeradditions.(Currentexamples includethe SPARC [Sun Microsystems 1991] and the Alpha[Sites 1992].)Strengthreductionmay also make otheroptimizationspossible,in particulartheeliminationof inductionvariables,as isshown in the next section.The termstrengthreductionis alsoapplied to operatorsubstitutionin a nonloop context,like replacingx x 2 withx + x.
These optimizationsare covered inSection 6.6.7.6.1.2inductionVariable EliminationOnce strengthreductionhas been performedon inductionvariableexpressions, the compilercan often eliminatethe originalvariableentirely.The loopexit test must then be expressed in termsof one of the strength-reducedinductionvariables;the optimizationis called linear functiontest replacement[Allen1969;Aho et al. 1986].
The replacementnotonly reduces the numberof operationsinACM Computmg Surveys, Vol. 26, No. 4, December 1994Davidl?Baconet al.do i=1, ndoi== a[i]a[ill,na[i]= a[i]end do+ c(a) originalend do(a) originalif(n > O) C = sqrt(x)F4 , C(R30);loadc IntoF4a[i]LU;loadn intoR8end doLIR8 , n(R30)R9 , #1ADDIR12, R30,#a;Ri2=address(a[l])R1O,#4;seti(R9)toRIO,R12,R1O;RiO=i*4;RiO=address(a[i+l])LFF5,-4(R1O);loadADDFF5,F5,;a[i]SF-4(R1O),SLTRil,ADDIR9,BNEZRll,F4F5R9,a[i]Figure9.intoLOOp;storenew a[il;ifi<n,compileda loop, but frees the registerused by theinductionvariable.Figure8(d) shows the resultof applying inductionvariableeliminationto thestrength-reducedcode from the previoussection.gotoLooploopF4,c(R30);loadc intoF4R8,n(R30);loadn intoR8R9,#l;setRI0,R30,#ai(R9)to6.1.3 Loop-lnvarianf1;RIO=address(a[i])F5,(R1O);loadADDFF5,F5,;a[i]:=a[i]+cSFADDI(R1O),F5R9, R9, #1;i=i+lADDISLTR1O,Ri0,#4;RIO=address(a[i+l])Rll,R9,BNEZRll,LOOpF4a[i];storeR8 ;Rll;ifintoF5new a[i]= i<n?i<n,gotoLoop(c) after strength reduction—RiOk a running suminstead of being recomputedfrom X9 and R12LFF4,c(R30);loadc intoF4L!dR8,n(R30);loadn intoR8ADDIRIO,R30,#a;RiO=address(a[i])R8,R8,F5,; R8=n*4R8, #4RIO, R8 ;R8=address(a[n+l])(R1O);loada[i]IntoADDFF5,F6,SFADDI(R1O),F5R1O, R10,#4SLTBNEZRll,Rll,MULTIADDILoop : LF(d) afterComputmgof inductionInductionF5;aCil:=a[il+c;storenew a[il;RIO=address(a[i+l]);Rll= R1O<R8?;if Rll,goto LoopR1O,R8LOOpeliminationFigure8.F4variableSumeys,Vol,code motion.F5LUADDILoop : LFLoop-invariant:=a[i]+cLFLI+ C(b) after code motionR8 ;Ril= i<n?; i=i+l#iR9,= a[i]1ADDI(b) initialACM= l,nLFR9,loopcodedo iLoop:14ULTI+ sqrt(x)variable(R9)optimlzations.26,N04, December1994Code MotionWhena computationappearsinsidea100P, but its resultdoes not change between iterations,the compilercan movethat computationoutsidetheloop [Ahoetal, 1986; Cocke and Schwartz1970].Code motioncanbeappliedat a highlevel to expressionsin the source code, orat a low level to addresscomputations.The latteris particularlyrelevantwhenindexingmultidimensionalarraysordereferencingpointers,as when the inner loop of a C programcontainsan expressionlike a.b + c.d[i].Figure 9(a) shows an example inwhichan expensivetranscendentalfunctioncallis moved outsideof the innerloop.
Thetest in the transformedcode in Figure9(b) ensuresthatif the loop is neverexecuted,themovedcodeis not executedeither, lest it raise an exception.Theprecomputedvalue is generallyassignedto a register.If registersarescarce, and the expressionmovedisinexpensive to compute,code motionmay actuallydeoptimizethe code, since registerspills willbeintroducedin the loop.Althoughcode motionissometimesreferredto as code hoisting,hoistingisaCompilerdo i=2,na[i]if= a[i](x+ c< 7)b[i]then* c [i]= a[i]elseb[i]* b[i-1]= a[i-1]end ifend do(a) originalif(nif> 1)(xthen< 7)do alllooptheni=2,na[i]= a[i]+ cb[i]= a[i]* c[i]end do allelsedo i=2,a[ilb[i]n= a[i]= a[i-1]+ c* b[i-1]end doend ifend if(b) after unswitchingFigure IO.Loop unswitching.6.1.4Loop UnswitchingLoop unswitchingis appliedwhen a loopcontainsa conditionalwitha loopinvarianttest condition.The loop is thenreplicatedinsideeachbranchof theconditional,saving the overheadof conditionalbranchinginsidethe loop, reducing the code size of the loop body, andpossiblyenablingthe parallelizationof abranchof theconditional[AllenandCocke 1971].●361Conditionalsthatare candidatesforunswitchingcan be detectedduringtheanalysisfor code motion,which identifiesloop-invariantvalues.In Figure10(a) the variablex is loopinvariant,allowingtheloopto beunstitchedand the truebranchto beexecutedin parallel,as shown in Figure10(b).