K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (798440), страница 88
Текст из файла (страница 88)
Operator precedence, from the expressiongrammar, ensures the correct evaluation order. Some operators require complex multioperation sequences for their implementation (e.g. exponentiationand trigonometric functions). These may be expanded inline or implementedwith a call to a library routine supplied by the compiler or the operatingsystem.7.3.5 Mixed-Type ExpressionsOne complication allowed by many programming languages is an operationwith operands of different types. (Here, we are concerned primarily withbase types in the source language, rather than programmer-defined types.)As described in Section 4.2, the compiler must recognize this situation andinsert the conversion code required by each operator’s conversion table.
Typically, this involves converting one or both operands to a more general typeand performing the operation in that more general type. The operation thatconsumes the result value may need to convert it to yet another type.Some processors provide explicit conversion operators; others expect thecompiler to generate complex, machine-dependent code. In either case, thecompiler writer may want to provide conversion operators in the ir. Such anoperator encapsulates all the details of the conversion, including any controlflow, and lets the compiler subject it to uniform optimization.
Thus, code7.3 Arithmetic Operators 349motion can pull an invariant conversion out of a loop without concern forthe loop’s internal control flow.Typically, the programming-language definition specifies a formula for eachconversion. For example, to convert integer to complex in fortran 77,the compiler first converts the integer to a real. It uses the resulting number as the real part of the complex number and sets the imaginary part to areal zero.For user-defined types, the compiler will not have conversion tables thatdefine each specific case. However, the source language still definesthe meaning of the expression. The compiler’s task is to implement thatmeaning; if a conversion is illegal, then it should be prevented. As seenin Chapter 4, many illegal conversions can be detected and prevented atcompile time.
When a compile-time check is either impossible or inconclusive, the compiler should generate a runtime check that tests for illegalcases. When the code attempts an illegal conversion, the check should raisea runtime error.7.3.6 Assignment as an OperatorMost Algol-like languages implement assignment with the following simplerules:1. Evaluate the right-hand side of the assignment to a value.2. Evaluate the left-hand side of the assignment to a location.3. Store the right-hand side value into the left-hand side location.Thus, in a statement such as a ← b, the two expressions a and b are evaluated differently. Since b appears to the right of the assignment operator, itis evaluated to produce a value; if b is an integer variable, that value is aninteger.
Since a is to the left of the assignment operator, it is evaluated toproduce a location; if a is an integer variable, that value is the location ofan integer. That location might be an address in memory, or it might be aregister. To distinguish between these modes of evaluation, we sometimesrefer to the result of evaluation on the right-hand side of an assignment as anrvalue and the result of evaluation on the left-hand side of an assignment asan lvalue.In an assignment, the type of the lvalue can differ from the type of thervalue. Depending on the language and the specific types, this situation mayrequire either a compiler-inserted conversion or an error message. The typical source-language rule for conversion has the compiler evaluate the rvalueto its natural type and then convert the result to the type of the lvalue.RvalueAn expression evaluated to a value is an rvalue.LvalueAn expression evaluated to a location is an lvalue.350 CHAPTER 7 Code ShapeSECTION REVIEWA postorder treewalk provides a natural way to structure a code generator for expression trees.
The basic framework is easily adapted to handlea variety of complications, including multiple kinds and locations ofvalues, function calls, type conversions, and new operators. To improvethe code further may require multiple passes over the code.Some optimizations are hard to fit into a treewalk framework. Inparticular, making good use of processor address modes (see Chapter11), ordering operations to hide processor-specific delays (seeChapter 12), and register allocation (see Chapter 13) do not fit wellinto the treewalk framework. If the compiler uses a treewalk to generateIR, it may be best to keep the IR simple and allow the back end to addressthese issues with specialized algorithms.Review Questions1.
Sketch the code for the two support routines, base and offset, usedby the treewalk code generator in Figure 7.5.2. How might you adapt the treewalk code generator to handle anunconditional jump operation, such as C’s goto statement?7.4 BOOLEAN AND RELATIONAL OPERATORSMost programming languages operate on a richer set of values than numbers. Usually, this includes the results of boolean and relational operators,both of which produce boolean values.
Because most programming languages have relational operators that produce boolean results, we treat theboolean and relational operators together. A common use for boolean and relational expressions is to alter the program’s control flow. Much of the powerof modern programming languages derives from the ability to compute andtest such values.The grammar uses the symbols ¬ for not, ∧for and, and ∨ for or to avoid confusion withILOC operators.The type checker must ensure that eachexpression applies operators to names, numbers,and expressions of appropriate types.Figure 7.7 shows the standard expression grammar augmented with booleanand relational operators.
The compiler writer must, in turn, decide how torepresent these values and how to compute them. For arithmetic expressions, such design decisions are largely dictated by the target architecture,which provides number formats and instructions to perform basic arithmetic.Fortunately, processor architects appear to have reached a widespread agreement about how to support arithmetic.
Similarly, most architectures providea rich set of boolean operations. However, support for relational operatorsvaries widely from one architecture to another. The compiler writer must usean evaluation strategy that matches the needs of the language to the availableinstruction set.7.4 Boolean and Relational Operators 351Expr→ Expr ∨ AndTerm|AndTermAndTerm → AndTerm ∧ RelExpr| RelExprRelExprNumExpr → NumExpr + Term| NumExpr − Term| TermTerm→ Term × Value||→ RelExpr < NumExpr||||||RelExpr ≤RelExpr =RelExpr 6=RelExpr ≥RelExpr >NumExprNumExprNumExprNumExprNumExprNumExprTerm ÷ ValueFactorValue→ ¬ FactorFactor→ (Expr )|||Factornumnamen FIGURE 7.7 Adding Booleans and Relationals to the Expression Grammar.7.4.1 RepresentationsTraditionally, two representations have been proposed for boolean values:a numerical encoding and a positional encoding. The former assigns specificvalues to true and false and manipulates them using the target machine’sarithmetic and logical operations.
The latter approach encodes the value ofthe expression as a position in the executable code. It uses comparisons andconditional branches to evaluate the expression; the different control-flowpaths represent the result of evaluation. Each approach works well for someexamples, but not for others.Numerical EncodingWhen the program stores the result of a boolean or relational operation intoa variable, the compiler must ensure that the value has a concrete representation. The compiler writer must assign numerical values to true and false thatwork with the hardware operations such as and, or, and not. Typical valuesare zero for false and either one or a word of ones, ¬false, for true.For example, if b, c, and d are all in registers, the compiler might producethe following code for the expression b ∨ c ∧ ¬ d:not rd⇒ r1and rc , r1 ⇒ r2or rb , r2 ⇒ r3For a comparison, such as a < b, the compiler must generate code thatcompares a and b and assigns the appropriate value to the result.
If thetarget machine supports a comparison operation that returns a boolean, thecode is trivial:cmp LT ra , rb ⇒ r1352 CHAPTER 7 Code ShapeILOC contains syntax to implement both styles ofcompare and branch. A normal IR would chooseone; ILOC includes both so that it can express thecode in this section.If, on the other hand, the comparison defines a condition code that mustbe read with a branch, the resulting code is longer and more involved. Thisstyle of comparison leads to a messier implementation for a < b.compcbr LTL1 : loadIjumpIL2 : loadIjumpIL3 : nopra , rb ⇒ cc1cc1→ L1 , L2true ⇒ r1→ L3false ⇒ r1→ L3Implementing a < b with condition-code operations requires more operationsthan using a comparison that returns a boolean.Positional EncodingIn the previous example, the code at L1 creates the value true and the codeat L2 creates the value false. At each of those points, the value is known.
Insome cases, the code need not produce a concrete value for the expression’sresult. Instead, the compiler can encode that value in a location in the code,such as L1 or L2 .Figure 7.8a shows the code that a treewalk code generator might emit forthe expression a < b ∨ c < d ∧ e < f. The code evaluates the three subexpressions, a < b, c < d, and e < f, using a series of comparisons and jumps.
Itthen combines the result of the three subexpression evaluations using theboolean operations at L9 . Unfortunately, this produces a sequence of operations in which every path takes 11 operations, including three branches andthree jumps. Some of the complexity of this code can be eliminated by representing the subexpression values implicitly and generating code that shortcircuits the evaluation, as in Figure 7.8b. This version of the code evaluatesa < b ∨ c < d ∧ e < f with fewer operations because it does not create valuesto represent the subexpressions.Positional encoding makes sense if an expression’s result is never stored.When the code uses the result of an expression to determine control flow,positional encoding often avoids extraneous operations. For example, in thecode fragmentif (a < b)then statement1else statement2the sole use for a < b is to determine whether statement1 or statement2executes.