1994. Compiler Transformations for High-Perforamce Computing (798436), страница 11
Текст из файла (страница 11)
Loop-basedare describedin6.6.1 Constant PropagationConstantpropagation[Callahanet al.1986; Kildall1973; Wegmanand Zadeck1991] is one of the most importantoptimizationthata compilercan perform,and a good optimizingcompilerwill apply it aggressively,Typicallyprogramscontainmany constants;by propagatingthem throughthe program,the compilercan do a significantamountof precomputation.More important,the propagationreveals many opportunitiesfor other optimization.In additionto obvious possibilitiessuch as dead-codeelimination,loop optimizationsare affectedbecauseconstantsoften appear in their inductionranges.Knowingthe range of the loop,the compilercan be much more accurateACMComputmgSumeys,Vol.
26, No. 4, December1994380eDauidF. Baconet al.t = i*4n=64C=3S=tdoi=l,na[i]endprint= a[i]doa[r](a) originalend+ 3printdoa[t](b)Figure 35.afterconstantConstantpropagationpropagation.Constantfoldingis a companionto constantpropagation:whenan expressioncontainsan operationwith constantvalues as operands,the compilercan replacethe expressionwith the result. For example, x = 3 * * 2 becomesx = 9. Typicallyconstantsare propagatedand folded simultaneously[Aho et al. 1986].Note that constantfoldingmay not belegal (underDefinition2.1.2 in Section2.1) if the operationsperformedat comthatpile time are not id~nticdto ‘chowwould have been performedat run time;a common source of such problemsis theroundingof floating-pointnumbersduring inputor outputbetweenphases ofthe compilationprocess[Clinger1990;Steele and White1990].6.6.3 Copy PropagationOptimizationeliminationsuch as inductionvariable(6. 1.2) and common- subex-ACMSurveys,1994codea [t]+ caftercopypropagationCopy propagationpression elimination(6.7.4) may cause thesame valueto be copied severaltimes.The compilercan propagatethe originalname of the value and eliminateredundant copies [Aho et al.
1986].Copypropagationreducesregisterpressureand eliminatesredundantregister-to-registermove instructions.An example is shown in Figure36.6.6.426, No 4, December*,= a[t]Figure 36.6.6.2 Constant FoldingVol+ c1*4(b)in applyingthe loop optimizationsthat,more than anythingelse, determineperformanceon high-speedmachines.Figare35 shows a simpleexampleofconstantpropagation.On V-DLX,the resultingloop can be convertedinto a single vector operationbecause the loop isthe same lengthas the hardwarevectorregisters.The originalloop would have tobe strip minedbefore vectorization(seeSection 6.2.4), increasingthe overheadofthe loop.Computmg= a[r](a) originalt .= a[i]a[s]codedoi=l,64a[i]*,r.t+ cForward SubstitutionForwardsubstitutionis a generalizationof copy propagation.The use of a variableis replacedby its definingexpression,which must be live at that point.
Substitutioncan change the dependencerelation betweenvariables[Wolfe1989b] orimprovethe analysisof subscriptexpressions in loops [Allenand Kennedy1987;Kuck et al. 1981].For instance,in Figure37(a) the loopcannotbe parallelizedbecausean unknownelementof a is being written.After forwardsubstitution,as showninFi~re37(b), the subscriptexpressionisin terms of the loop-boundvariable,andit is straightforwardto determinethatthe loop can be implementedas a parallel reduction(describedin Section 6.4.1).The use of variableslike npl is a common Fortranidiomthat was developedwhen compilersdid not performaggressive optimization.The idiomis recommendedas “good programmingstyle” ina numberof Fortranprogrammingtexts!CompilernplTransformationsXxo= n+ldoi=l,na[npl]endo/x= a[npl]Zxl(a) originaldocodeFigurena[n+llend=o=o=z381+ a[i]dodoalli=l,●= a[n+ll+38.Z+o=zx/1=xAlgebraicidentitiesusedin expressionsimplification.a[i]all(b)afterFigure 37.forwardForwardsubstitutionIdentltles Used In Strength Reduction(&IS the string concatenation operator)Table 2.substitution.ExpressionForwardsubstitutionis generallyperformed on array subscriptexpressionsatthesametimeas loopnormalization(Section6.3.6).Forefficientsubscriptanalysistechniquesto work,the arraysubscriptsmust be linear functionsof theinductionvariables.I ReducedExpr.II Datatypes~(a, O) + (b, O) 1 (a+len(s~ & s~)b, O)\ len(.s~)+len(sz)complexstring6.6.5 ReassociationReassociationis a techniquefor increasing the numberof commonsubexpressions in a program[Cocke and Markstein1980; Marksteinet al.
1994]. It is generallyappliedtoaddresscalculationswithinloops when performingstrengthreductionon inductionvariableexpressions (see Section 6.1.1). Address calculationsgeneratedbyarrayreferencesconsist of several multiplicationsand additions.Reassociationappliesthe assocommutative,anddistributiveciative,laws to rewritethese expressionsin acanonicalsum-of-productsform.Forwardsubstitutionis usuallyperformedwherepossiblein the addresscalculationsto increasethe numberofpotentialcommon subexpressions.6.6.6AlgebraicSimplificationThe compilercan simplifyarithmeticexpressionsby applyingalgebraicrules tothem.A particularlyusefulexampleisthe set of algebraicidentities.For instance,thestatementx = (y* 1 + O) / 1can be transformedinto x = y if x and yare integers.Figure38 illustratessomeof the commonlyappliedrules.Floating-pointarithmeticcan be problematicto simplify;for example,if x is anIEEEfloating-pointnumberwithvalueNan (not a number),then x x O = x, instead of O.6.6.7 Strength ReductionTheidentitiesin Table2 are calledstrengthreductionsbecause they replacean expensiveoperatorwith an equivalentless expensiveoperator.Section 6.1.1 discusses the applicationof strengthreduction to operationsthat appearinsidealoop.The two entriesin the table that referto multiplication,x x 2 = x + x and i x2’ = i << c, can be generalized.Multiplicationby any integerconstantcan beperformedusing only shift and add instructions[Bernstein1986].
Other transformationsare conditionalon the valuesof the affectedvariables;for example,i/2c = i >> c if and only if i is nonnegative[Steele 1977].It is also possible to convert exponentiation to multiplicationin the evaluationACMComputmgSurveys,Vol. 26, No, 4, December1994382●100DauidF. Baconwri.te(6,100)c[i]read(7,100)(d(j),formatjet al.= 1,Formatcompilationis furthercomplicated in C by the fact that printfandscanfare libraryfunctionsand may beredefinedby the programmer.100)(Al)(a) originalcode6.6.9 Superoptimizingcallputchar(c[i],callfgets(d,(b)ofpolynomialsa~xn100,afterFigure39.6)formatFormat,using+ a~.lxn–17)compilationcompilationthe identity+..
.=(a~x”-l+a~_lx’-2+alx+aO+ . ..+al)X+ao.6.6.8 110 Format CompilationMost languagesprovidefairlyelaboratefacilitiesfor formattinginput and output.The formattingspecificationsare in effect a formattingsublanguagethatisgenerally“interpreted”at run time, witha correspondinglyhigh cost for characterinput and output.Formattedwritescan be convertedalmost directlyinto calls to the run-timeroutinesthat implementthe variousformat styles. These calls are then likelycandidatesfor inlinesubstitution.Figure39 shows two 1/0statementsand theircompiledequivalents.Idiomrecognitionhas been performedto convertthe implieddoloopintoanaggregateoperation.Note that in Fortran,a formatstatement is analogousto a proceduredefinition,whichmaybe invokedby anystatements,‘Thenumberof read or writesame trade-offas with procedureinliningapplies:the formatted1/0can be expandedinlinefor higherefficiency,orencapsulatedas a procedurefor codecompactness(inliningis describedin Section 6.8.5).Formatcompilationis done by the VAXFortrancompiler[HarrisandHobbs1994] and by the Gnu C compiler[FreeSoftwareFoundation1992].ACMComputmgSurveys.Vol.
26. No 4, December1994A superoptimizer[Massalin1987] represents the extremeof optimization,seeking to replace a sequence of instructionswith the optimalalternative,It does anexhaustivesearch, beginningwith a single instruction.If all singleinstructionsequences fail, two-instructionsequencesare searched,and so on.A randomlygeneratedinstructionsequence is checked by executingit with asmallnumberof test inputsthat wererun throughthe originalsequence.If itpasses these tests, a thoroughverification procedureis applied.Superoptimizationat compiletime ispracticalonly for short sequences (on theorder of a dozen instructions).It can alsobe used by the compilerdesignerto findoptimalsequencesfor commonlyoccurring idioms,It is particularlyuseful foreliminatingconditionalbranchesin shortinstructionsequences, where the pipelinestall penaltymay be larger than the costof executingthe operationsthemselves[Granlundand Kenner1992].6.7 RedundancyEliminationThereare many optimizationsthat improve performanceby identifyingredundantcomputationsand removingthem[Moreland Renvoise1979].
We have already covered one such transformation,loop-invariantcode motion,in Section6.1.3. Therea computationwas beingperformedrepeatedlywhen it could bedone once.Redundancy-eliminatingtransformations remove two other kinds of computations:those thatare unreachableandthose that are useless. A computationisunreachableif it is never executed;removingit from the programwill have nosemanticeffect on the instructionsexecuted.Unreachablecode is createdbyprogrammers(most frequentlywith conditionaldebuggingcode), or by transfor-Compilermationsthathaveleft“orphan”codebehind.A computationis useless if none of theoutputsof the programare dependentonit.Transformations383●integerc, n, debugdebug = On=Oa .
b+j’if (debug j 1) thenC=a+b+d6.7.1.Unreachable-CodeEliminationMost compilersperformunreachable-codeelimination[Allenand Cocke 1971; Ahoet al. 1986]. In structuredprograms,thereare two primaryways for code to becomeunreachable.If a conditionalpredicateisknownto be true or false, one branchofthe conditionalis never taken,and itscode can be eliminated.The other common source of unreachablecode is a loo~that does not performany iterations.‘In an unstructuredprogramthat relieson gotostatementsto transfercontrol,unreachablecode is not obvious from theprogramstructurebut can be found bytraversingthe controlflow graph of theprogram.Both unreachableand useless code areoftencreatedby constantpropagation,described in Section 6.6.1. In Figure 40(a),debug is a constant.Whenthe variableits value is propagated,the conditionalexpressionbecomesif (O > 1).
This expressionis alwaysfalse, so the body ofthe conditionalis never executed and canbe eliminated,as shown in Figure40(b).Similarly,the body of the do loop is neverexecutedand is thereforeremoved.Unreachable-codeeliminationcan inturn allow anotheriterationof constantpropagationto discovermore constants;for this reasonsome compilersperformconstantpropagationmore than once.Unreachablecode is also knownasdead code, but that name is also appliedto useless code. so we have chosen to usethe more speci~c term.Sometimesconsideredas a separatestep, redundant-controleliminationremoves controlconstructssuch as loom.and conditionalswhen they become redundant(usuallyas a resultof constantpropagation).In Figure40(b), the loopand conditionalcontrolexm-essionsare.not used, and we can remove them fromthe program,as shown in (c).print*,end ifcall foo(a)doi=l,‘Warning--totalis‘, cna[i]end= a[i]+ cdo(a)integerc,debugn.on,originalcodedebug= Oa = b+7if(O > 1)endthenifcallfoo(a)doi=l,Oenddo(b)afterconstantpropagationcodeintegerdebugc,n,andunreachableeliminationdebug= On=Oa .b+7callfoo(a)(c) afterintegerc,redundantn,controleliminationdebuga = b+7callfoo(a)(d)afteruselesscodeeliminationa .