Cooper_Engineering_a_Compiler(Second Edition) (1157546), страница 29
Текст из файла (страница 29)
The solution resolves the ambiguity in a way by imposing arule that is easy for the programmer to remember. (To avoid the ambiguityentirely, some language designers have restructured the if-then-else construct by introducing elseif and endif.) In Section 3.5.3, we will look atother kinds of ambiguity and systematic ways of handling them.3.2.4 Encoding Meaning into StructureThe if-then-else ambiguity points out the relationship between meaning and grammatical structure.
However, ambiguity is not the only situationwhere meaning and grammatical structure interact. Consider the parse treethat would be built from a rightmost derivation of the simple expressiona + b x c.3.2 Expressing Syntax 93Rule26243ExprSentential FormExprExprExprExprExprOpExprOp namex nameExprOp<name,a>+Op name x name+ name x namename + name x nameDerivation of a + b x c<name,b><name,c>×Corresponding Parse TreeOne natural way to evaluate the expression is with a simple postorder treewalk. It would first compute a + b and then multiply that result by c toproduce the result (a + b) x c.
This evaluation order contradicts the classicrules of algebraic precedence, which would evaluate it as a + (b x c). Sincethe ultimate goal of parsing the expression is to produce code that will implement it, the expression grammar should have the property that it builds a treewhose “natural” treewalk evaluation produces the correct result.The real problem lies in the structure of the grammar. It treats all of thearithmetic operators in the same way, without any regard for precedence. Inthe parse tree for (a + b) x c, the fact that the parenthetic subexpression wasforced to go through an extra production in the grammar adds a level to theparse tree.
The extra level, in turn, forces a postorder treewalk to evaluatethe parenthetic subexpression before it evaluates the multiplication.We can use this effect to encode operator precedence levels into the grammar. First, we must decide how many levels of precedence are required. Inthe simple expression grammar, we have three levels of precedence: highestprecedence for ( ), medium precedence for x and ÷, and lowest precedence for + and -.
Next, we group the operators at distinct levels and usea nonterminal to isolate the corresponding part of the grammar. Figure 3.10123GoalExpr456Term7Factor →||89→→||→||n FIGURE 3.1 The Classic Expression Grammar.ExprExpr + TermExpr - TermTermTerm x FactorTerm ÷ FactorFactor( Expr )numname94 CHAPTER 3 Parsersshows the resulting grammar; it includes a unique start symbol, Goal, and aproduction for the terminal symbol num that we will use in later examples.In the classic expression grammar, Expr, represents the level for + and -,Term represents the level for × and ÷, and Factor represents the level for ( ).In this form, the grammar derives a parse tree for a + b x c that is consistentwith standard algebraic precedence, as shown below.Rule14699369ExprSentential FormExprExpr + TermExpr + Term x FactorExpr + Term x nameExpr + Factor x nameExpr + name x nameTerm + name x nameFactor + name x nameExprTerm+TermTermFactorFactor<name,x><name,y>×Factor<name,z>name + name x nameDerivation of a + b x cCorresponding Parse TreeA postorder treewalk over this parse tree will first evaluate b x c and thenadd the result to a.
This implements the standard rules of arithmetic precedence. Notice that the addition of nonterminals to enforce precedence addsinterior nodes to the tree. Similarly, substituting the individual operators foroccurrences of Op removes interior nodes from the tree.Other operations require high precedence.
For example, array subscriptsshould be applied before standard arithmetic operations. This ensures, forexample, that a + b[i] evaluates b[i] to a value before adding it to a,as opposed to treating i as a subscript on some array whose location iscomputed as a + b. Similarly, operations that change the type of a value,known as type casts in languages such as C or Java, have higher precedence than arithmetic but lower precedence than parentheses or subscriptingoperations.If the language allows assignment inside expressions, the assignment operator should have low precedence.
This ensures that the code completelyevaluates both the left-hand side and the right-hand side of the assignment before performing the assignment. If assignment (←) had the sameprecedence as addition, for example, the expression a ← b + c would assignb’s value to a before performing the addition, assuming a left-to-rightevaluation.3.2 Expressing Syntax 95CLASSES OF CONTEXT-FREE GRAMMARS AND THEIR PARSERSWe can partition the universe of context-free grammars into a hierarchybased on the difficulty of parsing the grammars. This hierarchy has manylevels.
This chapter mentions four of them, namely, arbitrary CFGs, LR(1)grammars, LL(1) grammars, and regular grammars (RGs). These sets nest asshown in the diagram.Arbitrary CFGs require more time toparse than the more restricted LR(1) orLL(1) grammars. For example, Earley’salgorithm parses arbitrary CFGs in O(n3 )time, worst case, where n is the numberof words in the input stream. Of course,the actual running time may be better. Historically, compiler writers haveshied away from "universal" techniquesbecause of their perceived inefficiency.RGLR(1)LL(1)Context-FreeGrammarsThe LR(1) grammars include a large subset of the unambiguous CFGs.
LR(1)grammars can be parsed, bottom-up, in a linear scan from left to right, looking at most one word ahead of the current input symbol. The widespreadavailability of tools that derive parsers from LR(1) grammars has made LR(1)parsers "everyone’s favorite parsers."The LL(1) grammars are an important subset of the LR(1) grammars. LL(1)grammars can be parsed, top-down, in a linear scan from left to right,with a one-word lookahead. LL(1) grammars can be parsed with either ahand-coded recursive-descent parser or a generated LL(1) parser. Manyprogramming languages can be written in an LL(1) grammar.Regular grammars (RGs) are CFGs that generate regular languages.
A regular grammar is a CFG where productions are restricted to two forms, eitherA→ a or A→ aB, where A, B ∈ NT and a ∈ T. Regular grammars are equivalent to regular expressions; they encode precisely those languages that canbe recognized by a DFA. The primary use for regular languages in compilerconstruction is to specify scanners.Almost all programming-language constructs can be expressed in LR(1)form and, often, in LL(1) form. Thus, most compilers use a fast-parsingalgorithm based on one of these two restricted classes of CFG.3.2.5 Discovering a Derivation for an Input StringWe have seen how to use a cfg G as a rewriting system to generate sentences that are in L(G). In contrast, a compiler must infer a derivation for a96 CHAPTER 3 Parsersgiven input string, or determine that no such derivation exists. The processof constructing a derivation from a specific input sentence is called parsing.A parser takes, as input, an alleged program written in some source language.The parser sees the program as it emerges from the scanner: a stream ofwords annotated with their syntactic categories.
Thus, the parser would seea + b x c as hname,ai + hname,bi x hname,ci. As output, the parser needs toproduce either a derivation for the input program or an error message for aninvalid program. For an unambiguous language, a parse tree is equivalent toa derivation; thus, we can think of the parser’s output as a parse tree.It is useful to visualize the parser as building a syntax tree for the inputprogram. The parse tree’s root is known; it represents the grammar’s startsymbol. The leaves of the parse tree are known; they must match, in orderfrom left to right, the stream of words returned by the scanner.
The hard partof parsing lies in discovering the grammatical connection between the leavesand the root. Two distinct and opposite approaches for constructing the treesuggest themselves:1. Top-down parsers begin with the root and grow the tree toward theleaves. At each step, a top-down parser selects a node for somenonterminal on the lower fringe of the tree and extends it with a subtreethat represents the right-hand side of a production that rewrites thenonterminal.2. Bottom-up parsers begin with the leaves and grow the tree toward theroot.
At each step, a bottom-up parser identifies a contiguous substringof the parse tree’s upper fringe that matches the right-hand side of someproduction; it then builds a node for the rule’s left-hand side andconnects it into the tree.In either scenario, the parser makes a series of choices about which productions to apply.
Most of the intellectual complexity in parsing lies inthe mechanisms for making these choices. Section 3.3 explores the issuesand algorithms that arise in top-down parsing, while Section 3.4 examinesbottom-up parsing in depth.3.3 TOP-DOWN PARSINGA top-down parser begins with the root of the parse tree and systematically extends the tree downward until its leaves match the classified wordsreturned by the scanner. At each point, the process considers a partially builtparse tree.