K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (798440), страница 56
Текст из файла (страница 56)
Assume that L(x,y) is a list constructor; it can be implementedas MakeNode2 (cons,x,y). The lower part of the figure shows the result ofapplying the two translation schemes to an input consisting of five elts.The two trees are, in many ways, equivalent. An in-order traversalof both trees visits the leaf nodes in the same order. If we addparentheses to reflect the tree structure, the left-recursive tree is((((elt1 ,elt2 ),elt3 ),elt4 ),elt5 ) while the right-recursive treeis (elt1 ,(elt2 ,(elt3 ,(elt4 ,elt5 )))).
The ordering produced by leftrecursion corresponds to the classic left-to-right ordering for algebraicoperators. The ordering produced by right recursion corresponds to thenotion of a list found in Lisp and Scheme.214 CHAPTER 4 Context-Sensitive AnalysisProductionActionsList → List elt| elt{$$ ← L($1,$2)}{$$ ← $1}elt5elt4elt3elt1ProductionActionsList → elt List| elt{$$ ← L($1,$2)}{$$ ← $1}elt1elt2elt3elt2elt4 elt5Left RecursionRight Recursionn FIGURE 4.17 Recursion versus Associativity.Sometimes, it is convenient to use different directions for recursion and associativity. To build the right-recursive tree from the left-recursive grammar,we could use a constructor that adds successive elements to the end of thelist.
A straightforward implementation of this idea would have to walk thelist on each reduction, making the constructor itself take O(n2 ) time, wheren is the length of the list. To avoid this overhead, the compiler can create alist header node that contains pointers to both the first and last nodes in thelist. This introduces an extra node to the list. If the system constructs manyshort lists, the overhead may be a problem.A solution that we find particularly appealing is to use a list header nodeduring construction and discard it after the list has been built. Rewriting thegrammar to use an -production makes this particularly clean.GrammarList→ |List eltQuux → ListActions{ $$ ← MakeListHeader ( ) }{ $$ ← AddToEnd($1, $2) }{ $$ ← RemoveListHeader($1) }A reduction with the -production creates the temporary list header node;with a shift-reduce parser, this reduction occurs first.
The List → List eltproduction invokes a constructor that relies on the presence of the temporary header node. When List is reduced on the right-hand side of any otherproduction, the corresponding action invokes a function that discards thetemporary header and returns the first element of the list.This approach lets the parser reverse the associativity at the cost of a smallconstant overhead in both space and time. It requires one more reduction perlist, for the -production.
The revised grammar admits an empty list, while4.6 Summary and Perspective 215the original grammar did not. To remedy this problem, RemoveListHeadercan explicitly check for the empty case and report the error.4.6 SUMMARY AND PERSPECTIVEIn Chapters 2 and 3, we saw that much of the work in a compiler’s frontend can be automated. Regular expressions work well for lexical analysis. Context-free grammars work well for syntax analysis.
In this chapter,we examined two ways to perform context-sensitive analysis: attributegrammar formalism and an ad hoc approach. For context-sensitive analysis, unlike scanning and parsing, formalism has not displaced the ad hocapproach.The formal approach, using attribute grammars, offers the hope of writing high-level specifications that produce reasonably efficient executables.While attribute grammars are not the solution to every problem in contextsensitive analysis, they have found application in several domains, rangingfrom theorem provers to program analysis. For problems in which theattribute flow is mostly local, attribute grammars work well.
Problems thatcan be formulated entirely in terms of one kind of attribute, either inheritedor synthesized, often produce clean, intuitive solutions when cast as attributegrammars. When the problem of directing the flow of attributes around thetree with copy rules comes to dominate the grammar, it is probably time tostep outside the functional paradigm of attribute grammars and introduce acentral repository for facts.The ad hoc technique, syntax-directed translation, integrates arbitrary snippets of code into the parser and lets the parser sequence the actions and passvalues between them.
This approach has been widely embraced because ofits flexibility and its inclusion in most parser-generator systems. The ad hocapproach sidesteps the practical problems that arise from nonlocal attributeflow and from the need to manage attribute storage. Values flow in one direction alongside the parser’s internal representation of its state (synthesizedvalues for bottom-up parsers and inherited for top-down parsers). Theseschemes use global data structures to pass information in the other directionand to handle nonlocal attribute flow.In practice, the compiler writer often tries to solve several problems atonce, such as building an intermediate representation, inferring types, andassigning storage locations.
This tends to create significant attribute flowsin both directions, pushing the implementor toward an ad hoc solutionthat uses some central repository for facts, such as a symbol table. Thejustification for solving many problems in one pass is usually compiletime efficiency. However, solving the problems in separate passes can216 CHAPTER 4 Context-Sensitive Analysisoften produce solutions that are easier to understand, to implement, and tomaintain.This chapter introduced the ideas behind type systems as an example of thekind of context-sensitive analysis that a compiler must perform. The studyof type theory and type-system design is a significant scholarly activity witha deep literature of its own. This chapter scratched the surface of type inference and type checking, but a deeper treatment of these issues is beyond thescope of this text.
In practice, the compiler writer needs to study the type system of the source language thoroughly and to engineer the implementationof type inference and type checking carefully. The pointers in this chapterare a start, but a realistic implementation requires more study.nCHAPTER NOTESType systems have been an integral part of programming languages sincethe original fortran compiler. While the first type systems reflected theresources of the underlying machine, deeper levels of abstraction soonappeared in type systems for languages such as Algol 68 and Simula 67.The theory of type systems has been actively studied for decades, producing a string of languages that embodied important principles. These includeRussell [45] (parametric polymorphism), clu [248] (abstract data types),Smalltalk [162] (subtyping through inheritance), and ml [265] (thoroughand complete treatment of types as first-class objects).
Cardelli has writtenan excellent overview of type systems [69]. The apl community produceda series of classic papers that dealt with techniques to eliminate runtimechecks [1, 35, 264, 349].Attribute grammars, like many ideas in computer science, were first proposed by Knuth [229, 230]. The literature on attribute grammars has focusedon evaluators [203, 342], on circularity testing [342], and on applicationsof attribute grammars [157, 298]. Attribute grammars have served as thebasis for several successful systems, including Intel’s Pascal compiler for the80286 [142, 143], the Cornell Program Synthesizer [297] and the SynthesizerGenerator [198, 299].Ad hoc syntax-directed translation has always been a part of the developmentof real parsers.
Irons described the basic ideas behind syntax-directed translation to separate a parser’s actions from the description of its syntax [202].Undoubtedly, the same basic ideas were used in hand-coded precedenceparsers. The style of writing syntax-directed actions that we describe wasintroduced by Johnson in Yacc [205]. The same notation has been carriedforward into more recent systems, including bison from the Gnu project.Exercises 217nEXERCISES1. In Scheme, the + operator is overloaded. Given that Scheme isdynamically typed, describe a method to type check an operation ofthe form (+ a b) where a and b may be of any type that is valid forthe + operator.Section 4.22.
Some languages, such as apl or php, neither require variabledeclarations nor enforce consistency between assignments to the samevariable. (A program can assign the integer 10 to × and later assign thestring value “book” to × in the same scope.) This style ofprogramming is sometimes called type juggling.Suppose that you have an existing implementation of a languagethat has no declarations but requires type-consistent uses. How couldyou modify it to allow type juggling?3.
Based on the following evaluation rules, draw an annotated parse treethat shows how the syntax tree for a - (b + c) is constructed.ProductionE0E0E0TT→→→→→E1 + TE1 − TT(E)idEvaluation Rules{{{{{E 0 .nptr ← mknode(+, E 1 .nptr, T.nptr) }E 0 .nptr ← mknode(-, E 1 .nptr, T.nptr) }E 0 .nptr ← T.nptr }T.nptr ← E.nptr }T.nptr ← mkleaf(id ,id .entry) }4. Use the attribute-grammar paradigm to write an interpreter for theclassic expression grammar.
Assume that each name has a valueattribute and a lexeme attribute. Assume that all attributes are alreadydefined and that all values will always have the same type.5. Write a grammar to describe all binary numbers that are multiples offour. Add attribution rules to the grammar that will annotate the startsymbol of a syntax tree with an attribute value that contains thedecimal value of the binary number.6. Using the grammar defined in the previous exercise, build the syntaxtree for the binary number 11100.a. Show all the attributes in the tree with their corresponding values.b.
Draw the attribute dependence graph for the syntax tree andclassify all attributes as being either synthesized or inherited.Section 4.3218 CHAPTER 4 Context-Sensitive AnalysisSection 4.47. A Pascal program can declare two integer variables a and b with thesyntaxvar a, b: intThis declaration might be described with the following grammar:VarDecl → var IDList : TypeIDIDList→ IDList, ID| IDwhere IDList derives a comma-separated list of variable names andTypeID derives a valid Pascal type. You may find it necessary torewrite the grammar.a.