Cooper_Engineering_a_Compiler(Second Edition) (1157546), страница 59
Текст из файла (страница 59)
The double dereference of a shows thatit is a call-by-reference formal parameter accessed through a pointer stored16 bytes before rarp . Finally, b is stored at offset 12 after the label @G.The level of abstraction matters because the compiler can, in general, onlyoptimize details that are exposed in the ir. Properties that are implicitin the ir are hard to change, in part because the compiler would needto translate implicit facts in different, instance-specific ways.
For example,to customize the code generated for an array reference, the compiler mustrewrite the related ir expressions. In a real program, different array references are optimized in different ways, each according to the surroundingcontext. For the compiler to tailor those references, it must be able to writedown the improvements in the ir.As a final point, notice that the representations for the variable referencesin the low-level tree reflect the different interpretations that occur on theright and left side of the assignment. On the left-hand side, w evaluates to anaddress, while both a and b evaluate to values because of the u operator.5.2.2 GraphsWhile trees provide a natural representation for the grammatical structure ofthe source code discovered by parsing, their rigid structure makes them lessuseful for representing other properties of programs.
To model these aspectsof program behavior, compilers often use more general graphs as irs. Thedag introduced in the previous section is one example of a graph.5.2 Graphical IRs 231Control-Flow GraphThe simplest unit of control flow in a program is a basic block—a maximallength sequence of straightline, or branch-free, code. A basic block is asequence of operations that always execute together, unless an operationraises an exception. Control always enters a basic block at its first operationand exits at its last operation.Basic blocka maximal-length sequence of branch-free codeA control-flow graph (cfg) models the flow of control between the basicblocks in a program.
A cfg is a directed graph, G = (N , E). Each noden ∈ N corresponds to a basic block. Each edge e = (ni , n j ) ∈ E correspondsto a possible transfer of control from block ni to block n j .Control-flow graphA CFG has a node for every basic block and an edgefor each possible control transfer between blocks.To simplify the discussion of program analysis in Chapters 8 and 9, weassume that each cfg has a unique entry node, n0 , and a unique exit node,n f .
In the cfg for a procedure, n0 corresponds to the procedure’s entry point.If a procedure has multiple entries, the compiler can insert a unique n0 andadd edges from n0 to each actual entry point. Similarly, n f corresponds tothe procedure’s exit. Multiple exits are more common than multiple entries,but the compiler can easily add a unique n f and connect each of the actualexits to it.The cfg provides a graphical representation of the possible runtime controlflow paths.
The cfg differs from the syntax-oriented irs, such as an ast, inwhich the edges show grammatical structure. Consider the following cfg fora while loop:while(i < 100)beginwhile i < 100stmt1stmt2stmt1endstmt2The edge from stmt1 back to the loop header creates a cycle; the ast for thisfragment would be acyclic.
For an if-then-else construct, the cfg is acyclic:if (x = y)then stmt1else stmt2stmt3if (x = y)stmt2stmt1stmt3It shows that control always flows from stmt1 and stmt2 to stmt3 . In an ast,that connection is implicit, rather than explicit.Compilers typically use a cfg in conjunction with another ir. The cfg represents the relationships among blocks, while the operations inside a blockIt begins with a labelled operation and ends witha branch, jump, or predicated operation.We use the acronym CFG for both context-freegrammar (see page 86) and control-flow graph.The meaning should be clear from context.232 CHAPTER 5 Intermediate Representations12345678910loadAIrarp , @a ⇒ raloadI2loadAIrarp , @b ⇒ rbrarp , @c ⇒ rcloadAIloadAI⇒ r2multrarp , @d ⇒ rdra , r2⇒ ramultra , rbmultra , rcmultra , rdstoreAI ra⇒⇒⇒⇒1rarp26rararararp , @a34758910n FIGURE 5.3 An ILOC Basic Block and Its Dependence Graph.are represented with another ir, such as an expression-level ast, a dag, orone of the linear irs.
The resulting combination is a hybrid ir.Single-statement blocksa block of code that corresponds to a singlesource-level statementSome authors recommend building cfgs in which each node represents ashorter segment of code than a basic block. The most common alternativeblock is a single-statement block. Using single-statement blocks can simplifyalgorithms for analysis and optimization.The tradeoff between a cfg built with single-statement blocks and one builtwith basic blocks revolves around time and space. A cfg built on singlestatement blocks has more nodes and edges than a cfg built with basicblocks.
The single-statement version uses more memory and takes longerto traverse than the basic-block version of a cfg. More important, as thecompiler annotates the nodes and edges in the cfg, the single-statement cfghas many more sets than the basic-block cfg. The time and space spent inconstructing and using these annotations undoubtedly dwarfs the cost of cfgconstruction.Many parts of the compiler rely on a cfg, either explicitly or implicitly.Analysis to support optimization generally begins with control-flow analysis and cfg construction (Chapter 9).
Instruction scheduling needs a cfgto understand how the scheduled code for individual blocks flows together(Chapter 12). Global register allocation relies on a cfg to understand howoften each operation might execute and where to insert loads and stores forspilled values (Chapter 13).Dependence GraphData-dependence grapha graph that models the flow of values fromdefinitions to uses in a code fragmentCompilers also use graphs to encode the flow of values from the point wherea value is created, a definition, to any point where it is used, a use. A datadependence graph embodies this relationship.
Nodes in a data-dependence5.2 Graphical IRs 23312x ← 03456while (i < 100)7print xa2i ← 134651if (a[i] > 0)then x ← x + a[i]i ← i + 17n FIGURE 5.4 Interaction between Control Flow and the Dependence Graph.graph represent operations. Most operations contain both definitions anduses. An edge in a data-dependence graph connects two nodes, one thatdefines a value and another that uses it.
We draw dependence graphs withedges that run from definition to use.To make this concrete, Figure 5.3 reproduces the example from Figure 1.3and shows its data-dependence graph. The graph has a node for each statement in the block. Each edge shows the flow of a single value. For example,the edge from 3 to 7 reflects the definition of rb in statement 3 and its subsequent use in statement 7. rarp contains the starting address of the localdata area. Uses of rarp refer to its implicit definition at the start of theprocedure; they are shown with dashed lines.The edges in the graph represent real constraints on the sequencing ofoperations—a value cannot be used until it has been defined.
However,the dependence graph does not fully capture the program’s control flow.For example, the graph requires that 1 and 2 precede 6. Nothing, however, requires that 1 or 2 precedes 3. Many execution sequences preservethe dependences shown in the code, including h1, 2, 3, 4, 5, 6, 7, 8, 9, 10i andh2, 1, 6, 3, 7, 4, 8, 5, 9, 10i. The freedom in this partial order is precisely whatan “out-of-order” processor exploits.At a higher level, consider the code fragment shown in Figure 5.4. References to a[i] are shown deriving their values from a node representing priordefinitions of a.
This connects all uses of a together through a single node.Without sophisticated analysis of the subscript expressions, the compilercannot differentiate between references to individual array elements.This dependence graph is more complex than the previous example. Nodes5 and 6 both depend on themselves; they use values that they may havedefined in a previous iteration.