K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (798440), страница 60
Текст из файла (страница 60)
The resulting combination is a hybrid ir.Single-statement blocksa block of code that corresponds to a singlesource-level statementSome authors recommend building cfgs in which each node represents ashorter segment of code than a basic block. The most common alternativeblock is a single-statement block. Using single-statement blocks can simplifyalgorithms for analysis and optimization.The tradeoff between a cfg built with single-statement blocks and one builtwith basic blocks revolves around time and space.
A cfg built on singlestatement blocks has more nodes and edges than a cfg built with basicblocks. The single-statement version uses more memory and takes longerto traverse than the basic-block version of a cfg. More important, as thecompiler annotates the nodes and edges in the cfg, the single-statement cfghas many more sets than the basic-block cfg. The time and space spent inconstructing and using these annotations undoubtedly dwarfs the cost of cfgconstruction.Many parts of the compiler rely on a cfg, either explicitly or implicitly.Analysis to support optimization generally begins with control-flow analysis and cfg construction (Chapter 9). Instruction scheduling needs a cfgto understand how the scheduled code for individual blocks flows together(Chapter 12).
Global register allocation relies on a cfg to understand howoften each operation might execute and where to insert loads and stores forspilled values (Chapter 13).Dependence GraphData-dependence grapha graph that models the flow of values fromdefinitions to uses in a code fragmentCompilers also use graphs to encode the flow of values from the point wherea value is created, a definition, to any point where it is used, a use. A datadependence graph embodies this relationship. Nodes in a data-dependence5.2 Graphical IRs 23312x ← 03456while (i < 100)7print xa2i ← 134651if (a[i] > 0)then x ← x + a[i]i ← i + 17n FIGURE 5.4 Interaction between Control Flow and the Dependence Graph.graph represent operations.
Most operations contain both definitions anduses. An edge in a data-dependence graph connects two nodes, one thatdefines a value and another that uses it. We draw dependence graphs withedges that run from definition to use.To make this concrete, Figure 5.3 reproduces the example from Figure 1.3and shows its data-dependence graph.
The graph has a node for each statement in the block. Each edge shows the flow of a single value. For example,the edge from 3 to 7 reflects the definition of rb in statement 3 and its subsequent use in statement 7. rarp contains the starting address of the localdata area. Uses of rarp refer to its implicit definition at the start of theprocedure; they are shown with dashed lines.The edges in the graph represent real constraints on the sequencing ofoperations—a value cannot be used until it has been defined.
However,the dependence graph does not fully capture the program’s control flow.For example, the graph requires that 1 and 2 precede 6. Nothing, however, requires that 1 or 2 precedes 3. Many execution sequences preservethe dependences shown in the code, including h1, 2, 3, 4, 5, 6, 7, 8, 9, 10i andh2, 1, 6, 3, 7, 4, 8, 5, 9, 10i. The freedom in this partial order is precisely whatan “out-of-order” processor exploits.At a higher level, consider the code fragment shown in Figure 5.4. References to a[i] are shown deriving their values from a node representing priordefinitions of a.
This connects all uses of a together through a single node.Without sophisticated analysis of the subscript expressions, the compilercannot differentiate between references to individual array elements.This dependence graph is more complex than the previous example. Nodes5 and 6 both depend on themselves; they use values that they may havedefined in a previous iteration. Node 6, for example, can take the value ofi from either 2 (in the initial iteration) or from itself (in any subsequentiteration). Nodes 4 and 5 also have two distinct sources for the value of i:nodes 2 and 6.234 CHAPTER 5 Intermediate RepresentationsData-dependence graphs are often used as a derivative ir—constructed fromthe definitive ir for a specific task, used, and then discarded. They play acentral role in instruction scheduling (Chapter 12).
They find application ina variety of optimizations, particularly transformations that reorder loops toexpose parallelism and to improve memory behavior; these typically requiresophisticated analysis of array subscripts to determine more precisely thepatterns of access to arrays. In more sophisticated applications of the datadependence graph, the compiler may perform extensive analysis of arraysubscript values to determine when references to the same array can overlap.Call GraphInterproceduralAny technique that examines interactions acrossmultiple procedures is called interprocedural.IntraproceduralAny technique that limits its attention to a singleprocedure is called intraprocedural.Call grapha graph that represents the calling relationshipsamong the procedures in a programThe call graph has a node for each procedure andan edge for each call site.To address inefficiencies that arise across procedure boundaries, some compilers perform interprocedural analysis and optimization.
To represent theruntime transfers of control between procedures, compilers use a call graph.A call graph has a node for each procedure and an edge for each distinctprocedure call site. Thus, the code calls q from three textually distinct sitesin p; the call graph has three edges ( p, q), one for each call site.Both software-engineering practice and language features complicate theconstruction of a call graph.nnnSeparate compilation, the practice of compiling small subsets of aprogram independently, limits the compiler’s ability to build a callgraph and to perform interprocedural analysis and optimization. Somecompilers build partial call graphs for all of the procedures in acompilation unit and perform analysis and optimization across that set.To analyze and optimize the whole program in such a system, theprogrammer must present it all to the compiler at once.Procedure-valued parameters, both as input parameters and as returnvalues, complicate call-graph construction by introducing ambiguouscall sites.
If fee takes a procedure-valued argument and invokes it, thatsite has the potential to call a different procedure on each invocation offee. The compiler must perform an interprocedural analysis to limit theset of edges that such a call induces in the call graph.Object-oriented programs with inheritance routinely create ambiguousprocedure calls that can only be resolved with additional typeinformation. In some languages, interprocedural analysis of the classhierarchy can provide the information needed to disambiguate thesecalls.
In other languages, that information cannot be known untilruntime. Runtime resolution of ambiguous calls poses a serious problemfor call graph construction; it also creates significant runtime overheadson the execution of the ambiguous calls.Section 9.4 discusses practical techniques for call graph construction.5.3 Linear IRs 235SECTION REVIEWGraphical IRs present an abstract view of the code being compiled. Theydiffer in the meaning imputed to each node and each edge.nnnnnIn a parse tree, nodes represent syntactic elements in the sourcelanguage grammar, while the edges tie those elements together intoa derivation.In an abstract syntax tree or a dag, nodes represent concrete itemsfrom the source-language program, and edges tie those together in away that indicates control-flow relationships and the flow of data.In a control-flow graph, nodes represent blocks of code and edgesrepresent transfers of control between blocks.
The definition of ablock may vary, from a single statement through a basic block.In a dependence graph, the nodes represent computations and theedges represent the flow of values from definitions to uses; as such,edges also imply a partial order on the computations.In a call graph, the nodes represent individual procedures and theedges represent individual call sites. Each call site has a distinct edgeto provide a representation for call-site specific knowledge, such asparameter bindings.Graphical IRs encode relationships that may be difficult to represent ina linear IR. A graphical IR can provide the compiler with an efficient wayto move between logically connected points in the program, such as thedefinition of a variable and its use, or the source of a conditional branchand its target.Review Questions1.