Cooper_Engineering_a_Compiler(Second Edition) (Rice), страница 60
Файл "Cooper_Engineering_a_Compiler(Second Edition)" внутри архива находится в следующих папках: Rice, Купер и Торчсон - перевод. PDF-файл из архива "Rice", который расположен в категории "разное". Всё это находится в предмете "конструирование компиляторов" из седьмого семестра, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст 60 страницы из PDF
Node 6, for example, can take the value ofi from either 2 (in the initial iteration) or from itself (in any subsequentiteration). Nodes 4 and 5 also have two distinct sources for the value of i:nodes 2 and 6.234 CHAPTER 5 Intermediate RepresentationsData-dependence graphs are often used as a derivative ir—constructed fromthe definitive ir for a specific task, used, and then discarded. They play acentral role in instruction scheduling (Chapter 12). They find application ina variety of optimizations, particularly transformations that reorder loops toexpose parallelism and to improve memory behavior; these typically requiresophisticated analysis of array subscripts to determine more precisely thepatterns of access to arrays. In more sophisticated applications of the datadependence graph, the compiler may perform extensive analysis of arraysubscript values to determine when references to the same array can overlap.Call GraphInterproceduralAny technique that examines interactions acrossmultiple procedures is called interprocedural.IntraproceduralAny technique that limits its attention to a singleprocedure is called intraprocedural.Call grapha graph that represents the calling relationshipsamong the procedures in a programThe call graph has a node for each procedure andan edge for each call site.To address inefficiencies that arise across procedure boundaries, some compilers perform interprocedural analysis and optimization.
To represent theruntime transfers of control between procedures, compilers use a call graph.A call graph has a node for each procedure and an edge for each distinctprocedure call site. Thus, the code calls q from three textually distinct sitesin p; the call graph has three edges ( p, q), one for each call site.Both software-engineering practice and language features complicate theconstruction of a call graph.nnnSeparate compilation, the practice of compiling small subsets of aprogram independently, limits the compiler’s ability to build a callgraph and to perform interprocedural analysis and optimization. Somecompilers build partial call graphs for all of the procedures in acompilation unit and perform analysis and optimization across that set.To analyze and optimize the whole program in such a system, theprogrammer must present it all to the compiler at once.Procedure-valued parameters, both as input parameters and as returnvalues, complicate call-graph construction by introducing ambiguouscall sites.
If fee takes a procedure-valued argument and invokes it, thatsite has the potential to call a different procedure on each invocation offee. The compiler must perform an interprocedural analysis to limit theset of edges that such a call induces in the call graph.Object-oriented programs with inheritance routinely create ambiguousprocedure calls that can only be resolved with additional typeinformation.
In some languages, interprocedural analysis of the classhierarchy can provide the information needed to disambiguate thesecalls. In other languages, that information cannot be known untilruntime. Runtime resolution of ambiguous calls poses a serious problemfor call graph construction; it also creates significant runtime overheadson the execution of the ambiguous calls.Section 9.4 discusses practical techniques for call graph construction.5.3 Linear IRs 235SECTION REVIEWGraphical IRs present an abstract view of the code being compiled. Theydiffer in the meaning imputed to each node and each edge.nnnnnIn a parse tree, nodes represent syntactic elements in the sourcelanguage grammar, while the edges tie those elements together intoa derivation.In an abstract syntax tree or a dag, nodes represent concrete itemsfrom the source-language program, and edges tie those together in away that indicates control-flow relationships and the flow of data.In a control-flow graph, nodes represent blocks of code and edgesrepresent transfers of control between blocks.
The definition of ablock may vary, from a single statement through a basic block.In a dependence graph, the nodes represent computations and theedges represent the flow of values from definitions to uses; as such,edges also imply a partial order on the computations.In a call graph, the nodes represent individual procedures and theedges represent individual call sites. Each call site has a distinct edgeto provide a representation for call-site specific knowledge, such asparameter bindings.Graphical IRs encode relationships that may be difficult to represent ina linear IR.
A graphical IR can provide the compiler with an efficient wayto move between logically connected points in the program, such as thedefinition of a variable and its use, or the source of a conditional branchand its target.Review Questions1. Compare and contrast the difficulty of writing a prettyprinter for aparse tree, an AST and a DAG. What additional information would beneeded to reproduce the original code’s format precisely?2.
How does the number of edges in a dependence graph grow as afunction of the input program’s size?5.3 LINEAR IRSThe alternative to a graphical ir is a linear ir. An assembly-language program is a form of linear code. It consists of a sequence of instructions thatexecute in their order of appearance (or in an order consistent with thatorder). Instructions may contain more than one operation; if so, those operations execute in parallel. The linear irs used in compilers resemble theassembly code for an abstract machine.Prettyprintera program that walks a syntax tree and writesout the original code236 CHAPTER 5 Intermediate RepresentationsThe logic behind using a linear form is simple.
The source code that servesas input to the compiler is a linear form, as is the target-machine code thatit emits. Several early compilers used linear irs; this was a natural notation for their authors, since they had previously programmed in assemblycode.Linear irs impose a clear and useful ordering on the sequence of operations.For example, in Figure 5.3, contrast the iloc code with the data-dependencegraph.
The iloc code has an implicit order; the dependence graph imposes apartial ordering that allows many different execution orders.If a linear ir is used as the definitive representation in a compiler, it mustinclude a mechanism to encode transfers of control among points in theprogram. Control flow in a linear ir usually models the implementation ofcontrol flow on the target machine. Thus, linear codes usually include conditional branches and jumps.
Control flow demarcates the basic blocks in alinear ir; blocks end at branches, at jumps, or just before labelled operations.Taken branchIn most ISAs, conditional branches use one label.Control flows either to the label, called the takenbranch, or to the operation that follows the label,called the not-taken or fall-through branch.In the iloc used throughout this book, we include a branch or jump at theend of every block.
In iloc, the branch operations specify a label for boththe taken path and the not-taken path. This eliminates any fall-through pathsat the end of a block. Together, these stipulations make it easier to find basicblocks and to reorder them.Many kinds of linear irs have been used in compilers.nDestructive operationan operation in which one of the operands isalways redefined with the resultnnOne-address codes model the behavior of accumulator machines andstack machines. These codes expose the machine’s use of implicitnames so that the compiler can tailor the code for it.
The resulting codeis quite compact.Two-address codes model a machine that has destructive operations.These codes fell into disuse as memory constraints became lessimportant; a three-address code can model destructive operationsexplicitly.Three-address codes model a machine where most operations take twooperands and produce a result.
The rise of risc architectures in the1980s and 1990s made these codes popular, since they resemble asimple risc machine.The remainder of this section describes two linear irs that remain popular:stack-machine code and three-address code. Stack-machine code offers acompact, storage-efficient representation.
In applications where ir size matters, such as a Java applet transmitted over a network before execution,stack-machine code makes sense. Three-address code models the instructionformat of a modern risc machine; it has distinct names for two operands and5.3 Linear IRs 237a result. You are already familiar with one three-address code: the iloc usedin this book.5.3.1 Stack-Machine CodeStack-machine code, a form of one-address code, assumes the presence ofa stack of operands. Most operations take their operands from the stackand push their results back onto the stack.
For example, an integer subtract operation would remove the top two elements from the stack and pushtheir difference onto the stack. The stack discipline creates a need for somenew operations. Stack irs usually include a swap operation that interchangesthe top two elements of the stack.
Several stack-based computers have beenbuilt; this ir seems to have appeared in response to the demands of compiling for these machines. Stack-machine code for the expression a - 2 × bappears in the margin.push 2push bmultiplypush asubtractStack-Machine CodeStack-machine code is compact.
The stack creates an implicit name spaceand eliminates many names from the ir. This shrinks the size of a programin ir form. Using the stack, however, means that all results and argumentsare transitory, unless the code explicitly moves them to memory.Stack-machine code is simple to generate and to execute. Smalltalk 80 andJava both use bytecodes, a compact ir similar in concept to stack-machinecode. The bytecodes either run in an interpreter or are translated into targetmachine code just prior to execution.
This creates a system with a compactform of the program for distribution and a reasonably simple scheme forporting the language to a new target machine (implementing the interpreter).Bytecodean IR designed specifically for its compact form;typically code for an abstract stack machineThe name derives from its limited size; opcodesare limited to one byte or less.5.3.2 Three-Address CodeIn three-address code most operations have the form i ← j op k, with anoperator (op), two operands (j and k) and one result (i). Some operators, such as an immediate load and a jump, will need fewer arguments.Sometimes, an operation with more than three addresses is needed.