Cooper_Engineering_a_Compiler(Second Edition) (1157546), страница 89
Текст из файла (страница 89)
If the sole use of the result is to determine control flow, as inFigure 7.9a, then the conditional branch that the compiler uses to read the condition code can often implement the source-level control-flow construct, aswell. If the result is used in a boolean operation, or it is preserved in a variable,as in Figure 7.9b, the code must convert the result into a concrete representation of a boolean, as do the two loadI operations in Figure 7.9b. Either way,the code has at least one conditional branch per relational operator.The advantage of condition codes comes from another feature that processors usually implement alongside condition codes.
Typically, arithmeticoperations on these processors set the condition code to reflect their computed results. If the compiler can arrange to have the arithmetic operationsthat must be performed also set the condition code needed to control thebranch, then the comparison operation can be omitted. Thus, advocates ofthis architectural style argue that it allows a more efficient encoding of theprogram—the code may execute fewer instructions than it would with acomparator that puts a boolean value in a general-purpose register.Conditional MoveThis scheme adds a conditional move instruction to the straight conditioncode model.
In iloc, a conditional move looks like:i2i LT cci , rj , rk ⇒ rm7.4 Boolean and Relational Operators 357If the condition code cci matches LT, then the value of rj is copied to rm .Otherwise, the value of rk is copied to rm . The conditional move operationtypically executes in a single cycle. It leads to faster code by allowing thecompiler to avoid branches.Conditional move retains the principal advantage of using condition codes—avoiding a comparison when an earlier operation has already set the condition code. As shown in Figure 7.9a, it lets the compiler encode simpleconditional operations with branches.
Here, the compiler speculatively evaluates the two additions. It uses conditional move for the final assignment.This is safe as long as neither addition can raise an exception.If the compiler has values for true and false in registers, say rT for trueand rF for false, then it can use conditional move to convert the conditioncode into a boolean. Figure 7.9b uses this strategy.
It compares a and b andplaces the boolean result in r1 . It computes the boolean for c < d into r2 . Itcomputes the final result as the logical and of r1 and r2 .Boolean-Valued ComparisonsThis scheme avoids condition codes entirely. The comparison operatorreturns a boolean value in a register. The conditional branch takes that resultas an argument that determines its behavior.Boolean-valued comparisons do not help with the code in Figure 7.9a.The code is equivalent to the straight condition-code scheme.
It requirescomparisons, branches, and jumps to evaluate the if-then-else construct.Figure 7.9b shows the strength of this scheme. The boolean compare letsthe code evaluate the relational operator without a branch and without converting comparison results to boolean values.
The uniform representationof boolean and relational values leads to concise, efficient code for thisexample.A weakness of this model is that it requires explicit comparisons. Whereasthe condition-code models can sometimes avoid the comparison by arranging to set the appropriate condition code with an earlier arithmetic operation, the boolean-valued comparison model always needs an explicitcomparison.Predicated ExecutionArchitectures that support predicated execution let the compiler avoid someconditional branches. In iloc, we write a predicated instruction by including a predicate expression before the instruction.
To remind the reader ofPredicated executionan architectural feature in which some operationstake a boolean-valued operand that determineswhether or not the operation takes effect358 CHAPTER 7 Code Shapethe predicate’s purpose, we enclose it in parentheses and follow it with aquestion mark. For example,(r17 )? add ra , rb ⇒ rcindicates an add operation (ra + rb ) that executes if and only if r17 containstrue.The example in Figure 7.9a shows the strength of predicated execution.The code is simple and concise.
It generates two predicates, r1 andr2 . It uses them to control the code in the then and else parts of thesource construct. In Figure 7.9b, predication leads to the same code as theboolean-comparison scheme.The processor can use predication to avoid executing the operation, or it canexecute the operation and use the predicate to avoid assigning the result.As long as the idled operation does not raise an exception, the differencesbetween these two approaches are irrelevant to our discussion. Our examplesshow the operations required to produce both the predicate and its complement.
To avoid the extra computation, a processor could provide comparisons that return two values, both the boolean value and its complement.SECTION REVIEWThe implementation of boolean and relational operators involves morechoices than the implementation of arithmetic operators. The compilerwriter must choose between a numerical encoding and a positionalencoding.
The compiler must map those decisions onto the set ofoperations provided by the target processor’s ISA.In practice, compilers choose between numerical and positionalencoding based on context. If the code instantiates the value,numerical encoding is necessary. If the value’s only use is to determinecontrol flow, positional encoding often produces better results.Review Questions1. If the compiler assigns the value zero to false, what are the relativemerits of each of the following values for true? One? Any non-zeronumber? A word composed entirely of ones?2. How might the treewalk code generation scheme be adapted to generate positional code for boolean and relational expressions? Can youwork short-circuit evaluation into your approach?7.5 Storing and Accessing Arrays 3597.5 STORING AND ACCESSING ARRAYSSo far, we have assumed that variables stored in memory contain scalar values.
Many programs need arrays or similar structures. The code requiredto locate and reference an element of an array is surprisingly complex. Thissection shows several schemes for laying out arrays in memory and describesthe code that each scheme produces for an array reference.7.5.1 Referencing a Vector ElementThe simplest form of an array has a single dimension; we call it a vector.Vectors are typically stored in contiguous memory, so that the ith elementimmediately precedes the i+1st element. Thus, a vector V[3...10] generates the following memory layout, where the number below a cell indicatesits index in the vector:V[3...10]345678910@VWhen the compiler encounters a reference, like V[6], it must use the indexinto the vector, along with facts available from the declaration of V, to generate an offset for V[6].
The actual address is then computed as the sum ofthe offset and a pointer to the start of V, which we write as @V.As an example, assume that V has been declared as V[low...high], wherelow and high are the vector’s lower and upper bounds. To translate the reference V[i], the compiler needs both a pointer to the start of storage for Vand the offset of element i within V. The offset is simply (i − low) × w,where w is the length of a single element of V. Thus, if low is 3, i is 6, andw is 4, the offset is (6 − 3) × 4 = 12. Assuming that ri holds the value of i,the following code fragment computes the address of V[i] into r3 and loadsits value into rV :loadIsubImultIaddload@Vri , 3r1 , 4r@V , r2r3⇒⇒⇒⇒⇒r@Vr1r2r3rV//////////get V’s address(offset - lower bound)x element length (4)address of V[i]value of V[i]Notice that the simple reference V[i] introduces three arithmetic operations.The compiler can improve this sequence.
If w is a power of two, the multiply360 CHAPTER 7 Code Shapecan be replaced with an arithmetic shift; many base types in real programming languages have this property. Adding the address and offset seemsunavoidable; perhaps this explains why most processors include an addressing mode that takes a base address and an offset and accesses the location atbase address + offset.
In iloc, we write this as loadAO.loadIsubIlshiftIloadAOFalse zeroThe false zero of a vector V is the address whereV[0] would be.In multiple dimensions, it is the location of a zeroin each dimension.@Vri , 3r1 , 2r@V , r2⇒⇒⇒⇒r@Vr1r2rV////////get V’s address(offset - lower bound)x element length (4)value of V[i]Using a lower bound of zero eliminates the subtraction. If the compilerknows the lower bound of V, it can fold the subtraction into @V. Rather thanusing @V as the base address for V, it can use V0 = @V − low × w. We call@V0 the false zero of V.V[3...10]0@V012345678910@VUsing @V0 and assuming that i is in ri , the code for accessing V[i] becomesloadI@V0⇒ r@V0lshiftI ri , 2⇒ r1loadAO r@V0 , r1 ⇒ rV// adjusted address for V// x element length (4)// value of V[i]This code is shorter and, presumably, faster.
A good assembly-language programmer might write this code. In a compiler, the longer sequence mayproduce better results by exposing details such as the multiply and add tooptimization. Low-level improvements, such as converting the multiply intoa shift and converting the add–load sequence into with loadAO, can be donelate in compilation.If the compiler does not know an array’s bounds, it might calculate thearray’s false zero at runtime and reuse that value in each reference to thearray. It might compute the false zero on entry to a procedure that referenceselements of the array multiple times.
An alternative strategy, employed inlanguages like c, forces the use of zero as a lower bound, which ensures that@V0 = @V and simplifies all array-address calculations. However, attentionto detail in the compiler can achieve the same results without restricting theprogrammer’s choice of a lower bound.7.5 Storing and Accessing Arrays 3617.5.2 Array Storage LayoutAccessing an element of a multidimensional array requires more work.Before discussing the code sequences that the compiler must generate, wemust consider how the compiler will map array indices to memory locations.Most implementations use one of three schemes: row-major order, columnmajor order, or indirection vectors. The source-language definition usuallyspecifies one of these mappings.The code required to access an array element depends on the way that thearray is mapped to memory.