Cooper_Engineering_a_Compiler(Second Edition) (1157546), страница 97
Текст из файла (страница 97)
Inthis scheme, the caller passes a value to the callee that specifies whichregisters it must save. The callee adds the registers it must save to thevalue and calls the appropriate compiler-provided save routine. Theepilogue passes the same value to the restore routine so that it canreload the needed registers. This approach limits the overhead toone call to save registers and one to restore them. It separatesresponsibility (caller saves versus callee saves) from the cost tocall the routine.The compiler writer must pay close attention to the implications of the various options on code size and runtime speed.
The code should use the fastestoperations for saves and restores. This requires a close look at the costs ofsingle-register and multiregister operations on the target architecture. Usinglibrary routines to perform saves and restores can save space; careful implementation of those library routines may mitigate the added cost of invokingthem.396 CHAPTER 7 Code ShapeSECTION REVIEWThe code generated for procedure calls is split between the callerand the callee, and between the four pieces of the linkage sequence(prologue, epilogue, precall, and postreturn). The compiler coordinatesthe code in these multiple locations to implement the linkage convention, as discussed in Chapter 6.
Language rules and parameter bindingconventions dictate the order of evaluation and the style of evaluationfor actual parameters. System-wide conventions determine responsibilityfor saving and restoring registers.Compiler writers pay particular attention to the implementation ofprocedure calls because the opportunities are difficult for generaloptimization techniques (see Chapters 8 and 10) to discover. Themany-to-one nature of the caller-callee relationship complicates analysisand transformation, as does the distributed nature of the cooperatingcode sequences. Equally important, minor deviations from the definedlinkage convention can cause incompatibilities in code compiled withdifferent compilers.Review Questions1.
When a procedure saves registers, either callee-saves registers in itsprologue or caller-saves registers in a precall sequence, where shouldit save those registers? Are all of the registers saved for some callstored in the same AR?2. In some situations, the compiler must create a storage location to holdthe value of a call-by-reference parameter. What kinds of parametersmay not have their own storage locations? What actions might berequired in the precall and postcall sequences to handle these actualparameters correctly?7.10SUMMARY AND PERSPECTIVEOne of the more subtle tasks that confronts the compiler writer is selectinga pattern of target-machine operations to implement each source-languageconstruct.
Multiple implementation strategies are possible for almost anysource-language statement. The specific choices made at design time have astrong impact on the code that the compiler generates.In a compiler that is not intended for production use—a debugging compileror a student compiler—the compiler writer might select easy to implement translations for each strategy that produce simple, compact code. InChapter Notes 397an optimizing compiler, the compiler writer should focus on translationsthat expose as much information as possible to the later phases of thecompiler—low-level optimization, instruction scheduling, and register allocation. These two different perspectives lead to different shapes for loops,to different disciplines for naming temporary variables, and, possibly, todifferent evaluation orders for expressions.The classic example of this distinction is the case statement.
In a debugging compiler, the implementation as a cascaded series of if-then-elseconstructs is fine. In an optimizing compiler, the inefficiency of the myriadtests and branches makes a more complex implementation scheme worthwhile. The effort to improve the case statement must be made when their is generated; few, if any, optimizers will convert a cascaded series ofconditionals into a binary search or a direct jump table.nCHAPTER NOTESThe material contained in this chapter falls, roughly, into two categories:generating code for expressions and handling control-flow constructs.Expression evaluation is well explored in the literature. Discussions of howto handle control flow are rarer; much of the material on control flow in thischapter derives from folklore, experience, and careful reading of the outputof compilers.Floyd presented the first multipass algorithm for generating code fromexpression trees [150].
He points out that both redundancy elimination andalgebraic reassociation have the potential to improve the results of his algorithm. Sethi and Ullman [311] proposed a two-pass algorithm that is optimalfor a simple machine model; Proebsting and Fischer extended this work toaccount for small memory latencies [289]. Aho and Johnson [5] introduceddynamic programming to find least-cost implementations.The predominance of array calculations in scientific programs led to workon array-addressing expressions and to optimizations (like strength reduction, Section 10.7.2) that improve them.
The computations described inSection 7.5.3 follow Scarborough and Kolsky [307].Harrison used string manipulation as a motivating example for the pervasiveuse of inline substitution and specialization [182]. The example mentionedat the end of Section 7.6.4 comes from that paper.Mueller and Whalley describe the impact of different loop shapes on performance [271]. Bernstein provides a detailed discussion of the options thatarise in generating code for case statements [40]. Calling conventions arebest described in processor-specific and operating-system-specific manuals.398 CHAPTER 7 Code ShapeOptimization of range checks has a long history. The pl/.8 compiler insistedon checking every reference; optimization lowered the overhead [257].
Morerecently, Gupta and others have extended these ideas to increase the set ofchecks that can be moved to compile time [173].nSection 7.2EXERCISES1. Memory layout affects the addresses assigned to variables. Assumethat character variables have no alignment restriction, short integervariables must be aligned to halfword (2 byte) boundaries, integervariables must be aligned to word (4 byte) boundaries, and longinteger variables must be aligned to doubleword (8 byte) boundaries.Consider the following set of declarations:char a;long int b;int c;short int d;long int e;char f;Draw a memory map for these variables:a. Assuming that the compiler cannot reorder the variablesb.
Assuming the compiler can reorder the variables to save space2. As demonstrated in the previous question, the compiler needs analgorithm to lay out memory locations within a data area. Assume thatthe algorithm receives as input a list of variables, their lengths, andtheir alignment restrictions, such asha, 4, 4i, hb, 1, 3i, hc, 8, 8i, hd, 4, 4i, he, 1, 4i, hf, 8, 16i, hg, 1, 1i.The algorithm should produce, as output, a list of variables and theiroffsets in the data area. The goal of the algorithm is to minimizeunused, or wasted, space.a. Write down an algorithm to lay out a data area with minimalwasted space.b. Apply your algorithm to the example list above and two other liststhat you design to demonstrate the problems that can arise instorage layout.c. What is the complexity of your algorithm?3. For each of the following types of variable, state where in memory thecompiler might allocate the space for such a variable.
PossibleExercises 399answers include registers, activation records, static data areas (withdifferent visibilities), and the runtime heap.a. A variable local to a procedureb. A global variablec. A dynamically allocated global variabled. A formal parametere. A compiler-generated temporary variable4. Use the treewalk code-generation algorithm from Section 7.3 togenerate naive code for the following expression tree. Assume anunlimited set of registers.:=-d**bb 4*ca5.
Find the minimum number of registers required to evaluate thefollowing trees using the iloc instruction set. For each nonleaf node,indicate which of its children must be evaluated first in order toachieve this minimum number of registers.:=+-d*bb 4*a(a)-w*z*cxy(b)6. Build expression trees for the following two arithmetic expressions,using standard precedence and left-to-right evaluation. Compute theminimum number of registers required to evaluate each of them usingthe iloc instruction set.a. ((a + b) + (c + d)) + ((e + f) + (g + h))b.
a + b + c + d + e + f + g + hSection 7.3400 CHAPTER 7 Code Shape7. Generate predicated iloc for the following code sequence. (Nobranches should appear in the solution.)Section 7.4if (x < y)then z = x * 5;else z = y * 5;w = z + 10;8. As mentioned in Section 7.4, short-circuit code for the followingexpression in c avoids a potential division-by-zero error:a != 0 && b / a > 0.5If the source-language definition does not specify short-circuitevaluation for boolean-valued expressions, can the compiler generateshort-circuit code as an optimization for such expressions? Whatproblems might arise?9.
For a character array A[10...12,1...3] stored in row-major order,calculate the address of the reference A[i,j], using at most fourarithmetic operations in the generated code.Section 7.510. What is a dope vector? Give the contents of the dope vector for thecharacter array in the previous question. Why does the compiler needa dope vector?11. When implementing a c compiler, it might be advisable to have thecompiler perform range checking for array references. Assumingrange checks are used and that all array references in a c programhave successfully passed them, is it possible for the program to accessstorage outside the range of an array, for example, accessing A[-1]for an array declared with lower bound zero and upper bound N?Section 7.612. Consider the following character-copying loop from Section 7.6.2:loadIloadIloadIdo {*a++ = *b++;} while (*b!=‘\0’)L1 : cloadcstoreaddIaddIcmp NEcbrL2 : nop@b@aNULL⇒ r@b⇒ r@a⇒ r1r@br2r@b , 1r@a , 1r1 , r2r4⇒⇒⇒⇒⇒→// get pointers// terminatorr2// get next charr@a// store itr@b// bump pointersr@ar4L1 , L2// next stmtExercises 401Modify the code so that it branches to an error handler at Lsov on anyattempt to overrun the allocated length of a.