Effective Partial Redundancy Elimination (798424), страница 4
Текст из файла (страница 4)
For an example, consider the (simplified)code below, where the left and right fragments show thesame code before and after forward propagation:?i ← i + 1if i = m branch?i ← i+n?i ← i + 1if i < 100 branchCode ExpansionThe speed and space requirements of our approach areprimarily dependent on the amount of code expansionintroduced by forward propagation.
In the worst case,this expansion can be exponential in the size of the routine. To see how bad the expansion is likely to be inpractice, we measured the the effect of forward propagation on the routines in our test suite. Table 2 showsthe results of these tests. The entries in the before andafter columns represent static counts of the number ofILOC operations in each routine. The column labeledexpansion indicates the code growth factor due to forward propagation.54 × ri − 48 × ri − 8n←j+ki←04.3i←0?i ← i + 1if i = m branch?n←j+ki ← i+n?i ← i + 1if i < 100 branchIn this case, the computation of n ← j + k has beenpushed into the loop, potentially shortening some pathsthrough the program.
However, since we expect theloop to execute many times, the code on the right ispotentially much slower (of course, the actual tradeoff isundecidable, as it depends on the values of j, k, and m).Recalling from Section 2 that PRE will never lengthena path through the code, we realize that PRE will notbe able to hoist the evaluation of j + k out of the loopwithout lengthing the path around the use of n.DiscussionIn implementing these techniques, we encountered several issues that merit further attention.5.1Forward Propagation and CorrectnessIf an expression name is live across a basic block boundary, PRE will sometimes hoist an expression past a useof its name. Consider the example below:r10 ← sqrt(r9 )if p branchr10 ← sqrt(r9 )if p branch?r9 ← r1000?- r20 ← r10r10 ← sqrt(r9 )r30 ← r10before PRE?r9 ← r1000r10 ← sqrt(r9 )?- r20 ← r10r30 ← r10after PREThe problem is that the fragment on the left violatesa requirement for correct behavior of PRE; namely, anexpression defined in one basic block may not be referenced in another basic block.2 Forward propagation satisfies this rule by moving the computation ofr10 ← sqrt(r9 ) directly before its use, relying on the renaming introduced by SSA to preserve the correct version of r9 .
We note that Chow also mentions using forward propagation [8]; we conjecture that it helped himavoid the same difficulty with PRE.An alternative approach to ensuring that no expression name is live across a basic block boundary is to insert copies to newly created variable names and rewritelater references so that they refer to the variable namerather than the expression name. While it is possiblethat this approach could be used to avoid some of thenegative effects of forward propagation, it may detractfrom the effectiveness of reassociation. This remains atopic for future research.2 Wehave never seen this requirement stated in the literature andbelieve it to be a source of confusion in the community.5.2routinebilancardebcoeraycolburdcoeraddefludebfludebicodecompdesecodrepvidriglefillfehlfminfmtgenfmtsetfppppgamgenheathmoyihbtrinidebinisetinithxintegrorgparparoipastemprophyrepvidrkf45rkfssaturrsaxpysevalsgemmsgemvsisolvesplinesubbsuppsvdtomcatvtwldrvurandx21y21yehzerointotalsbefore2,0009162806594224,0403,7672,72894111,5451,7505651,25755237258855120,1478429251537721,0646,5662,3788121,3524,3002,5672,6951,5841648811,524951676772931783191,1731,1991,5892,5632,64513,40518970929276107,475after2,3571,0243971,1551,0507,0894,8222,9841,14413,5372,2626671,99658166169660027,3581,0701,8171687901,2006,7472,5399671,6414,9212,7943,4731,9222281,1802,1311021909763402023751,2201,1992,0753,9843,61015,870212751,628400136,377expansion1.1791.1181.4181.7532.4881.7551.2801.0941.2161.1731.2931.1811.5881.0531.7771.1841.0891.3581.2711.9641.0981.0231.1281.0281.0681.1911.2141.1441.0881.2891.2131.3901.3391.3981.0741.1381.4421.1601.1351.1761.0401.0001.3061.5541.3651.1841.1221.0711.7521.4491.269Table 2: Code Expansion from Forward PropagationInteraction with Other OptimizationsSome optimizations interact poorly with our technique.For example, many compilers replace an integer multiply with one constant argument by a series of shifts,adds, and subtracts [4].
Since shifts are not associative,this optimization should not be performed until afterglobal reassociation. For example, if ((x × y) × 2) × z isprematurely converted into ((x×y) 1)×z, we lose theopportunity to group z with either x or y. This effectis measurable; indeed, we have accidentally measured itmore than once.We expect that strength reduction will improve thecode beyond the results shown in this paper. Reassociation should let strength reduction introduce fewer distinct induction variables, particularly in code with complex subscripts like that produced by cache and registerblocking [5, 27]. Of course, some particularly sophisticated approaches to strength reduction include a formof reassociation [20]; we believe that a separate pass ofreassociation will significantly simplify the implementation of strength reduction.
Additionally, implementingglobal reassociation as a separate pass provides benefitsto other optimizations, even in loop-free code.5.3Common Subexpression EliminationThe experiments described in Section 4 show that PREis a powerful component of an optimizing compiler. Anatural question is: “How does it compare to other approaches?” To answer this, we will consider three different approaches. Assume for each that we have used thetechniques described in Sections 3.1 and 3.2 to encodevalue equivalence into the name space.1. Alpern, Wegman, and Zadeck suggest the followingscheme: If a value x is computed at two points, pand q, and p dominates q, then the computation atq is redundant and may be deleted [2, page 2].2. The classic approach to global common subexpression elimination is to calculate the set of expressionsavailable at each point in a routine.
If x is availableon every path reaching p, then any computation ofx at p is redundant and may be deleted.3. Partial redundancy elimination, as described inSection 2.These methods form a hierarchy. The first method removes only a subset of the redundancies in the code.For instance, it cannot remove the redundancy shownin the first example of Section 2 where x + y occurs ineach clause of an if-then-else and again in the block thatfollows. The second method, based on available expressions, will handle this case; it removes all redundancies.PRE is stronger yet – it removes all redundancies andmany partial redundancies as well.6Related WorkWhile there have been many papers discussing partial redundancy elimination (e.g., [21, 14, 12, 18]),none mention the deficiencies discussed in Sections 2.2and 2.3.
Rosen et al. recognize the naming problem andpropose a complex alternative to PRE; however, they donot consider reordering complex expressions [23].The idea of exploiting associativity and distributivityto rearrange expressions is well known [17, 1]; however,early work concentrated on simplifying individual expressions. We know of two prior approaches to reassociation with the goal of exposing loop-invariant expressions, both discovered within IBM and published thesame year.
Scarborough and Kolsky describe a frontend discipline for generating an array address expressionas a sum of products and associating the sum to exposethe loop-invariant parts [25]. Cocke and Markstein alsomention the idea of reassociation, this time within theoptimizer instead of the front end [9].In a chapter for an upcoming book, Markstein et al.describe a sophisticated algorithm for strength reduction that includes a form of reassociation [20]. Theiralgorithm attacks the problem on a loop-by-loop basis,working from inner to outer loops. In each loop, theyperform some forward propagation and sort subexpressions into loop-variant and loop-invariant parts, hoisting the invariant parts.
We presume their approach isa development of earlier work within IBM. Other workby O’Brien et al. and Santhanam briefly describe whatare apparently further developments of the Cocke andMarkstein approach [22, 24].It is difficult to compare our approach directly tothese earlier methods. We were motivated by a desire toseparate concerns. We already had solutions to hoisting loop invariants and strength reduction; therefore,we looked for a way to reassociate expressions. We alsoprefer our global approach to loop-by-loop alternativessince it can make improvements in loop-free code andmay admit simpler implementation.Recent work by Feigen et al.
and by Knoop et al.describe alternative approaches to the problem of eliminating partially-dead expressions [15, 19]. While anadequate comparison of the alternatives would requiretrial implementations and empirical measurements, itis clear that they solve a similar class of problems inradically different ways. In our case, the elimination ofsome partially-dead expressions is an unexpected benefit of forward propagation.7SummaryIn this paper, we show how to use global reassociationand global value numbering to reshape code in a waythat improves the effectiveness and applicability of partial redundancy elimination. The effect of these trans-formations is to expose new opportunities for optimization. In particular, more expressions are shown to beredundant or loop-invariant; partial redundancy elimination optimizes these newly exposed cases. Additionally, some partially-dead expressions are eliminated.We showed experimental results that demonstrate theeffectiveness of partial redundancy elimination.
Thedata also shows that applying our transformations before partial redundancy elimination can produce significant further improvements.We introduced an algorithm for global reassociation.It efficiently reorders the operands of associative operations to expose loop-invariant expressions. Its simplicityshould make it easy to add to an existing compiler.AcknowledgementsWe owe a debt of gratitude to our colleagues on thecompiler project: Tim Harvey, Rob Shillingsburg, Taylor Simpson, Lisa Thomas, and Linda Torczon.
Without their support and implementation efforts, this workwould have been impossible. We also appreciate thethorough reviews provided by the program committee;they significantly improved both the form and contentof this paper.References[1] Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman.Compilers, Principles, Techniques and Tools. AddisonWesley, Reading, MA, 1986.[2] Bowen Alpern, Mark N. Wegman, and F. KennethZadeck. Detecting equality of variables in programs. InConference Record of the Fifteenth Annual ACM Symposium on Principles of Programming Languages, pages1–11, San Diego, California, January 1988.[3] Marc A.