Cooper_Engineering_a_Compiler(Second Edition) (1157546), страница 94
Текст из файла (страница 94)
Given the source textif exprthen statement1else statement2statement3the compiler must generate code that evaluates expr and branches tostatement1 or statement2 , based on the value of expr. The iloc code thatimplements the two statements must end with a jump to statement3 . Aswe saw in Section 7.4, the compiler has many options for implementingif-then-else constructs.The discussion in Section 7.4 focused on evaluating the controlling expression. It showed how the underlying instruction set influenced the strategies for handling both the controlling expression and, in some cases, thecontrolled statements.382 CHAPTER 7 Code ShapeProgrammers can place arbitrarily large code fragments inside the then andelse parts.
The size of these code fragments has an impact on the com-piler’s strategy for implementing the if-then-else construct. With trivialthen and else parts, as shown in Figure 7.9, the primary consideration forthe compiler is matching the expression evaluation to the underlying hardware. As the then and else parts grow, the importance of efficient executioninside the then and else parts begins to outweigh the cost of executing thecontrolling expression.For example, on a machine that supports predicated execution, using predicates for large blocks in the then and else parts can waste execution cycles.Since the processor must issue each predicated instruction to one of its functional units, each operation with a false predicate has an opportunity cost—itties up an issue slot.
With large blocks of code under both the then andelse parts, the cost of unexecuted instructions may outweigh the overheadof using a conditional branch.Figure 7.14 illustrates this tradeoff. It assumes that both the then and elseparts contain 10 independent iloc operations and that the target machine canissue two operations per cycle.Figure 7.14a shows code that might be generated using predication; itassumes that the value of the controlling expression is in r1 . The code issuestwo instructions per cycle. One of them executes in each cycle. All of thethen part’s operations are issued to Unit 1, while the then part’s operations are issued to Unit 2. The code avoids all branching.
If each operationUnit 1(r1)(r1)(r1)(r1)(r1)(r1)(r1)(r1)(r1)(r1)Unit 2comparison ⇒ r1op1(¬r1) op11op2(¬r1) op12op3(¬r1) op13op4(¬r1) op14op5(¬r1) op15op6(¬r1) op16op7(¬r1) op17op8(¬r1) op18op9(¬r1) op19op10 (¬r1) op20Unit 1Unit 2compare & branchL1 : op1op3op5op7op9jumpIop2op4op6op8op10→ L3L2 : op11op13op15op17op19jumpIop12op14op16op18op20→ L3L3 : nop(a) Using Predicatesn FIGURE 7.14 Predication versus Branching.(b) Using Branches7.8 Control-Flow Constructs 383BRANCH PREDICTION BY USERSOne urban compiler legend concerns branch prediction.
FORTRAN hasan arithmetic if statement that takes one of three branches, based onwhether the controlling expression evaluates to a negative number, tozero, or to a positive number. One early compiler allowed the user to supply a weight for each label that reflected the relative probability of takingthat branch. The compiler then used the weights to order the branches ina way that minimized total expected delay from branching.After the compiler had been in the field for a year, the story goes, a maintainer discovered that the branch weights were being used in the reverseorder, maximizing the expected delay. No one had complained.
The storyis usually told as a fable about the value of programmers’ opinions aboutthe behavior of code they have written. (Of course, no one reported theimprovement, if any, from using the branch weights in the correct order.)takes a single cycle, it takes 10 cycles to execute the controlled statements,independent of which branch is taken.Figure 7.14b shows code that might be generated using branches; it assumesthat control flows to L1 for the then part or to L2 for the else part.
Becausethe instructions are independent, the code issues two instructions per cycle.Following the then path takes five cycles to execute the operations for thetaken path, plus the cost of the terminal jump. The cost for the else part isidentical.The predicated version avoids the initial branch required in the unpredicatedcode (to either L1 or L2 in the figure), as well as the terminal jumps (toL3 ). The branching version incurs the overhead of a branch and a jump, butmay execute faster.
Each path contains a conditional branch, five cycles ofoperations, and the terminal jump. (Some of the operations may be used tofill delay slots on jumps.) The difference lies in the effective issue rate—the branching version issues roughly half the instructions of the predicatedversion.
As the code fragments in the then and else parts grow larger, thisdifference becomes larger.Choosing between branching and predication to implement an if-thenelse requires some care. Several issues should be considered, as follows:1. Expected frequency of execution If one side of the conditional executessignificantly more often, techniques that speed execution of that pathmay produce faster code. This bias may take the form of predicting abranch, of executing some instructions speculatively, orof reordering the logic.384 CHAPTER 7 Code Shape2. Uneven amounts of code If one path through the construct containsmany more instructions than the other, this may weigh againstpredication or for a combination of predication and branching.3. Control flow inside the construct If either path contains nontrivialcontrol flow, such as an if-then-else, loop, case statement,or call, then predication may be a poor choice. In particular, nested ifconstructs create complex predicates and lower the fraction of issuedoperations that are useful.To make the best decision, the compiler must consider all these factors, aswell as the surrounding context.
These factors may be difficult to assess earlyin compilation; for example, optimization may change them in significantways.7.8.2 Loops and IterationMost programming languages include loop constructs to perform iteration.The first fortran compiler introduced the do loop to perform iteration.Today, loops are found in many forms. For the most part, they have a similarstructure.Consider the c for loop as an example. Figure 7.15 shows how the compiler might lay out the code. The for loop has three controlling expressions:e1 , which provides for initialization; e2 , which evaluates to a boolean andgoverns execution of the loop; and e3 , which executes at the end of each iteration and, potentially, updates the values used in e2 .
We will use this figureas the basic schema to explain the implementation of several kinds of loops.For (e1 ; e2 ; e3 ) {loop body}StepPurpose11Evaluate e122If (¬e2 )Then goto 533Loop Body44Evaluate e3If (e2 )Then goto 35Code After Loop5(a) Example Code for Loop(b) Schema for Implementing Loopn FIGURE 7.15 General Schema for Layout of a for Loop.7.8 Control-Flow Constructs 385If the loop body consists of a single basic block—that is, it contains noother control flow—then the loop that results from this schema has an initialbranch plus one branch per iteration. The compiler might hide the latencyof this branch in one of two ways.
If the architecture allows the compiler topredict whether or not the branch is taken, the compiler should predict thebranch in step 4 as being taken (to start the next iteration). If the architectureallows the compiler to move instructions into the delay slot(s) of the branch,the compiler should attempt to fill the delay slot(s) with instruction(s) fromthe loop body.For LoopsTo map a for loop into code, the compiler follows the general schema fromFigure 7.15. To make this concrete, consider the following example. Steps 1and 2 produce a single basic block, as shown in the following code:for (i=1; i<=100; i++) {loop body}next statementloadI 1loadI 100cmp GT ri , r1cbrr2L1 : loop bodyaddIri , 1cmp LE ri , r1cbrr3L2 : next statement⇒⇒⇒→rir1r2L2 , L1// Step 1// Step 2// Step 3⇒ ri// Step 4⇒ r3→ L1 , L2// Step 5The code produced in steps 1, 2, and 4 is straightforward. If the loop body(step 3) either consists of a single basic block or it ends with a single basicblock, then the compiler can optimize the update and test produced in step 4with the loop body.
This may lead to improvements in the code—for example, the instruction scheduler might use operations from the end of step 3 tofill delay slots in the branch from step 4.The compiler can also shape the loop so that it has only one copy of the test—the one in step 2. In this form, step 4 evaluates e3 and then jumps to step 2.The compiler would replace cmp LE, cbr sequence at the end of the loopwith a jumpI.
This form of the loop is one operation smaller than the twotest form. However, it creates a two-block loop for even the simplest loops,and it lengthens the path through the loop by at least one operation. Whencode size is a serious consideration, consistent use of this more compact loopform might be worthwhile.