Cooper_Engineering_a_Compiler(Second Edition) (1157546), страница 58
Текст из файла (страница 58)
Their for a procedure might include the code that defines it, the results of statict1t2t3t4←←←←b2 × t1at3 - t2226 CHAPTER 5 Intermediate Representationsanalysis, profile data from previous executions, and maps to let the debuggerunderstand the code and its data. All of these facts should be expressed in away that makes clear their relationship to specific points in the ir.5.2 GRAPHICAL IRSMany compilers use irs that represent the underlying code as a graph. Whileall the graphical irs consist of nodes and edges, they differ in their level ofabstraction, in the relationship between the graph and the underlying code,and in the structure of the graph.5.2.1 Syntax-Related TreesThe parse trees shown in Chapter 3 are graphs that represent the sourcecode form of the program. Parse trees are one specific form of treelike irs.In most treelike irs, the structure of the tree corresponds to the syntax of thesource code.Parse TreesAs we saw in Section 3.2.2, the parse tree is a graphical representation for the derivation, or parse, that corresponds to the input program.Figure 5.1 shows the classic expression grammar alongside a parse tree fora × 2 + a × 2 × b.
The parse tree is large relative to the source text because itrepresents the complete derivation, with a node for each grammar symbol inthe derivation. Since the compiler must allocate memory for each node andeach edge, and it must traverse all those nodes and edges during compilation,it is worth considering ways to shrink this parse tree.Goal→ ExprExpr→ Expr + Term||TermExpr - TermTerm→ Term × Factor||Term ÷ FactorFactorFactor → ( Expr )||numnameGoalExprExprTermTerm×FactorTerm × FactorTerm × Factor < name,b >Factor < num,2 >Factor <num,2>< name,a >(a) Classic Expression GrammarTerm+< name,a >(b) Parse Tree for a × 2 + a × 2 × bn FIGURE 5.1 Parse Tree for a × 2 + a × 2 × b Using the Classic Expression Grammar.5.2 Graphical IRs 227Minor transformations on the grammar, as described in Section 3.6.1,can eliminate some of the steps in the derivation and their correspondingsyntax-tree nodes.
A more effective technique is to abstract away thosenodes that serve no real purpose in the rest of the compiler. This approachleads to a simplified version of the parse tree, called an abstract syntax tree.Parse trees are used primarily in discussions of parsing, and in attributegrammar systems, where they are the primary ir.
In most other applicationsin which a source-level tree is needed, compiler writers tend to use one ofthe more concise alternatives, described in the remainder of this subsection.Abstract Syntax TreesThe abstract syntax tree (ast) retains the essential structure of the parse treebut eliminates the extraneous nodes. The precedence and meaning of theexpression remain, but extraneous nodes have disappeared. Here is the astfor a × 2 + a × 2 × b:Abstract syntax treeAn AST is a contraction of the parse tree that omitsmost nodes for nonterminal symbols.+××a2a×b2The ast is a near-source-level representation.
Because of its rough correspondence to a parse tree, the parser can built an ast directly (seeSection 4.4.2).asts have been used in many practical compiler systems. Source-to-sourcesystems, including syntax-directed editors and automatic parallelizationtools, often use an ast from which source code can easily be regenerated. The S-expressions found in Lisp and Scheme implementations are,essentially, asts.Even when the ast is used as a near-source-level representation, representation choices affect usability.
For example, the ast in the Rn ProgrammingEnvironment used the subtree shown in the margin to represent a complexconstant in fortran, written (c1 ,c2 ). This choice worked well for thesyntax-directed editor, in which the programmer was able to change c1 andc2 independently; the pair node corresponded to the parentheses and thecomma.This pair format, however, proved problematic for the compiler. Eachpart of the compiler that dealt with constants needed special-case codefor complex constants. All other constants were represented with a singlepairc1c2AST Designed for Editingconstant(c1,c2)AST for Compiling228 CHAPTER 5 Intermediate RepresentationsSTORAGE EFFICIENCY AND GRAPHICAL REPRESENTATIONSMany practical systems have used abstract syntax trees to represent thesource text being translated.
A common problem encountered in thesesystems is the size of the AST relative to the input text. Large data structurescan limit the size of programs that the tools can handle.The AST nodes in the Rn Programming Environment were large enoughthat they posed a problem for the limited memory systems of 1980sworkstations. The cost of disk I/O for the trees slowed down all the Rntools.No single problem leads to this explosion in AST size. Rn had only one kindof node, so that structure included all the fields needed by any node. Thissimplified allocation but increased the node size. (Roughly half the nodeswere leaves, which need no pointers to children.) In other systems, thenodes grow through the addition of myriad minor fields used by one passor another in the compiler. Sometimes, the node size increases over time,as new features and passes are added.Careful attention to the form and content of the AST can shrink its size.In Rn , we built programs to analyze the contents of the AST and how theAST was used.
We combined some fields and eliminated others. (In somecases, it was less expensive to recompute information than to write it andread it.) In a few cases, we used hash linking to record unusual facts—usingone bit in the field that stores each node’s type to indicate the presenceof additional information stored in a hash table. (This scheme reduced thespace devoted to fields that were rarely used.) To record the AST on disk,we converted it to a linear representation with a preorder treewalk; thiseliminated the need to record any internal pointers.In Rn , these changes reduced the size of ASTs in memory by roughly 75percent. On disk, after the pointers were removed, the files were abouthalf the size of their memory representation.
These changes let Rn handlelarger programs and made the tools more responsive.node that contained a pointer to the constant’s actual text. Using a similar format for complex constants would have complicated some operations,such as editing the complex constants and loading them into registers. Itwould have simplified others, such as comparing two constants. Taken overthe entire system, the simplifications would likely have outweighed thecomplications.Abstract syntax trees have found widespread use. Many compilers and interpreters use them; the level of abstraction that those systems need varieswidely. If the compiler generates source code as its output, the ast typically has source-level abstractions.
If the compiler generates assembly code,5.2 Graphical IRs 229the final version of the ast is usually at or below the abstraction level of themachine’s instruction set.Directed Acyclic GraphsWhile the ast is more concise than a syntax tree, it faithfully retains thestructure of the original source code. For example, the ast for a × 2 + a × 2 × bcontains two distinct copies of the expression a × 2. A directed acyclic graph(dag) is a contraction of the ast that avoids this duplication.
In a dag, nodescan have multiple parents, and identical subtrees are reused. Such sharingmakes the dag more compact than the corresponding ast.For expressions without assignment, textually identical expressions mustproduce identical values. The dag for a × 2 + a × 2 × b , shown to the left,reflects this fact by sharing a single copy of a × 2. The dag encodes anexplicit hint for evaluating the expression. If the value of a cannot changebetween the two uses of a, then the compiler should generate code toevaluate a × 2 once and use the result twice. This strategy can reduce thecost of evaluation. However, the compiler must prove that a’s value cannot change. If the expression contains neither assignment nor calls to otherprocedures, the proof is easy.
Since an assignment or a procedure call canchange the value associated with a name, the dag construction algorithmmust invalidate subtrees as the values of their operands change.dags are used in real systems for two reasons. If memory constraints limitthe size of programs that the compiler can handle, using a dag can help byreducing the memory footprint. Other systems use dags to expose redundancies. Here, the benefit lies in better compiled code.
These latter systemstend to use the dag as a derivative ir—building the dag, transforming thedefinitive ir to reflect the redundancies, and discarding the dag.Level of AbstractionAll of our example trees so far have shown near-source irs. Compilersalso use low-level trees. Tree-based techniques for optimization and codegeneration, in fact, may require such detail. As an example, consider thestatement w ← a - 2 × b.
A source-level ast creates a concise form, as shownin Figure 5.2a. However, the source-level tree lacks much of the detailneeded to translate the statement into assembly code. A low-level tree,shown in Figure 5.2b, can make that detail explicit. This tree introduces fournew node types. A val node represents a value already in a register. A numnode represents a known constant.
A lab node represents an assembly-levellabel, typically a relocatable symbol. Finally, u is an operator that dereferences a value; it treats the value as a memory address and returns the contentsof the memory at that address.Directed acyclic graphA DAG is an AST with sharing. Identical subtrees areinstantiated once, with multiple parents.+××ab2230 CHAPTER 5 Intermediate Representations←+-←valrarp-wnum4×num2×a2+bvalrarp(a) Source-Level ASTnumlab@G-16(b) Low-Level AST+num12n FIGURE 5.2 Abstract Syntax Trees with Different Levels of Abstraction.Data areaThe compiler groups together storage for valuesthat have the same lifetime and visibility. We callthese blocks of storage data areas.The low-level tree reveals the address calculations for the three variables.w is stored at offset 4 from the pointer in rarp , which holds the pointer to thedata area for the current procedure.