Cooper_Engineering_a_Compiler(Second Edition) (1157546), страница 32
Текст из файла (страница 32)
For an example of such a grammar, consider extending the expression grammar to include function calls,denoted with parentheses, ( and ), and array-element references, denotedwith square brackets, [ and ]. To add these options, we replace production 11, Factor → name, with a set of three rules, plus a set of right-recursiverules for argument lists.11 Factor→12|13|15 ArgList→16 MoreArgs →17|namename [ ArgList ]name ( ArgList )Expr MoreArgs, Expr MoreArgsBecause productions 11, 12, and 13 all begin with name, they have identicalfirst+ sets. When the parser tries to expand an instance of Factor with alookahead of name, it has no basis to choose among 11, 12, and 13.
Thecompiler writer can implement a parser that chooses one rule and backtrackswhen it is wrong. As an alternative, we can transform these productions tocreate disjoint first+ sets.A two-word lookahead would handle this case.However, for any finite lookahead we can devisea grammar where that lookahead is insufficient.108 CHAPTER 3 ParsersThe following rewrite of productions 11, 12, and 13 describes the samelanguage but produces disjoint first+ sets:11 Factor→12 Arguments →13|14|Left factoringthe process of extracting and isolating commonprefixes in a set of productionsname Arguments[ ArgList ]( ArgList )The rewrite breaks the derivation of Factor into two steps. The first stepmatches the common prefix of rules 11, 12, and 13.
The second step recognizes the three distinct suffixes: [ Expr ] , ( Expr ) , and . The rewrite addsa new nonterminal, Arguments, and pushes the alternate suffixes for Factor into right-hand sides for Arguments. We call this transformation leftfactoring.We can left factor any set of rules that has alternate right-hand sides with acommon prefix. The transformation takes a nonterminal and its productions:A → αβ1 | αβ2 | · · · | αβn | γ1 | γ2 | · · · | γjwhere α is the common prefix and the γi ’s represent right-hand sides thatdo not begin with α.
The transformation introduces a new nonterminal B torepresent the alternate suffixes for α and rewrites the original productionsaccording to the pattern:A → α B | γ1 | γ2 | · · · | γjB → β1 | β2 | · · · | βnTo left factor a complete grammar, we must inspect each nonterminal, discover common prefixes, and apply the transformation in a systematic way.For example, in the pattern above, we must consider factoring the right-handsides of B, as two or more of the βi ’s could share a prefix.
The process stopswhen all common prefixes have been identified and rewritten.Left-factoring can often eliminate the need to backtrack. However, somecontext-free languages have no backtrack-free grammar. Given an arbitrarycfg, the compiler writer can systematically eliminate left recursion anduse left-factoring to eliminate common prefixes. These transformations mayproduce a backtrack-free grammar.
In general, however, it is undecidablewhether or not a backtrack-free grammar exists for an arbitrary context-freelanguage.3.3.2 Top-Down Recursive-Descent ParsersBacktrack-free grammars lend themselves to simple and efficient parsingwith a paradigm called recursive descent. A recursive-descent parser is3.3 Top-Down Parsing 109PREDICTIVE PARSERS VERSUS DFAsPredictive parsing is the natural extension of DFA-style reasoning to parsers.A DFA transitions from state to state based solely on the next inputcharacter. A predictive parser chooses an expansion based on the nextword in the input stream.
Thus, for each nonterminal in the grammar, theremust be a unique mapping from the first word in any acceptable inputstring to a specific production that leads to a derivation for that string. Thereal difference in power between a DFA and a predictively parsable grammar derives from the fact that one prediction may lead to a right-handside with many symbols, whereas in a regular grammar, it predicts only asingle symbol. This lets predictive grammars include productions such asp→ (p), which are beyond the power of a regular expression to describe.(Recall that a regular expression can recognize (+ 6 ∗ )+ , but this doesnot specify that the numbers of opening and closing parentheses mustmatch.)Of course, a hand-coded, recursive-descent parser can use arbitrary tricksto disambiguate production choices.
For example, if a particular left-handside cannot be predicted with a single-symbol lookahead, the parser coulduse two symbols. Done judiciously, this should not cause problems.structured as a set of mutually recursive procedures, one for each nonterminal in the grammar. The procedure corresponding to nonterminal Arecognizes an instance of A in the input stream. To recognize a nonterminalB on some right-hand side for A, the parser invokes the procedure corresponding to B. Thus, the grammar itself serves as a guide to the parser’simplementation.Consider the three rules for Expr 0 in the right-recursive expression grammar:Production234Expr 0 → + Term Expr 0| - Term Expr 0| FIRST+{+}{-}{ ,eof,) }To recognize instances of Expr 0 , we will create a routine EPrime().
It follows a simple scheme: choose among the three rules (or a syntax error) basedon the first+ sets of their right-hand sides. For each right-hand side, thecode tests directly for any further symbols.To test for the presence of a nonterminal, say A, the code invokes the procedure that corresponds to A. To test for a terminal symbol, such as name, itperforms a direct comparison and, if successful, advances the input stream110 CHAPTER 3 ParsersEPrime()/* Expr 0 → + Term Expr 0 | - Term Expr 0 */if (word = + or word = -) then begin;word ← NextWord();if (Term())then return EPrime();else return false;end;else if (word = ) or word = eof)then return true;else begin;report a syntax error;return false;/* Expr 0 → *//* no match */end;n FIGURE 3.9 An Implementation of EPrime().by calling the scanner, NextWord(). If it matches an -production, the codedoes not call NextWord().
Figure 3.9 shows a straightforward implementation of EPrime(). It combines rules 2 and 3 because they both end with thesame suffix, Term Expr 0 .The strategy for constructing a complete recursive-descent parser is clear.For each nonterminal, we construct a procedure to recognize its alternativeright-hand sides. These procedures call one another to recognize nonterminals. They recognize terminals by direct matching.
Figure 3.10 showsa top-down recursive-descent parser for the right-recursive version of theclassic expression grammar shown in Figure 3.4 on page 101. The code forsimilar right-hand sides has been combined.For a small grammar, a compiler writer can quickly craft a recursive-descentparser. With a little care, a recursive-descent parser can produce accurate,informative error messages. The natural location for generating those messages is when the parser fails to find an expected terminal symbol—insideEPrime, TPrime, and Factor in the example.3.3.3 Table-Driven LL(1) ParsersFollowing the insights that underlie the first+ sets, we can automaticallygenerate top-down parsers for backtrack-free grammars.
The tool constructsfirst, follow, and first+ sets. The first+ sets completely dictate the parsing decisions, so the tool can then emit an efficient top-down parser. Theresulting parser is called an ll(1) parser. The name ll(1) derives from thefact that these parsers scan their input left to right, construct a leftmost3.3 Top-Down Parsing 111Main( )TPrime( )/* Goal → Expr */word ← NextWord( );/* Term 0 → x Factor Term 0 *//* Term 0 → ÷ Factor Term 0 */if (Expr( ))if (word = x or word = ÷ )then if (word = eof )then begin;word ← NextWord( );if ( Factor( ) )then return TPrime( );else Fail();then report success;else Fail( );Fail( )report syntax error;attempt error recovery or exit;end;else if (word = + or word = - orword = ) or word = eof)/* Term 0 → */then return true;else Fail();Expr( )/* Expr → Term Expr 0 */if ( Term( ) )then return EPrime( );else Fail();Factor( )EPrime( )/* Expr 0 → + Term Expr 0 *//* Expr 0 → - Term Expr 0 */if (word = + or word = - )then begin;word ← NextWord( );if ( Term() )then return EPrime( );else Fail();end;else if (word = ) or word = eof)/* Expr 0 → */then return true;else Fail();/* Factor → ( Expr ) */if (word = ( ) then begin;word ← NextWord( );if (not Expr( ) )then Fail();if (word 6= ) )then Fail();word ← NextWord( );return true;end;/* Factor → num *//* Factor → name */else if (word = num orword = name )then begin;Term( )Term → Factor Term 0/**/if ( Factor( ) )then return TPrime( );else Fail();n FIGURE 3.10 Recursive-Descent Parser for Expressions.word ← NextWord( );return true;end;else Fail();112 CHAPTER 3 Parsersword ← NextWord( );push eof onto Stack;push the start symbol, S, onto Stack;focus ← top of Stack;loop forever;if (focus = eof and word = eof)then report success and exit the loop;else if (focus ∈ T or focus = eof) then begin;if focus matches word then begin;pop Stack;word ← NextWord( );end;else report an error looking for symbol at top of stack;end;else begin; /* focus is a nonterminal */if Table[focus,word] is A → B1 B2 · · · Bk then begin;pop Stack;for i ← k to 1 by -1 do;if (Bi 6= )then push Bi onto Stack;end;end;else report an error expanding focus;end;focus ← top of Stack;end;(a) The Skeleton LL(1) ParserGoalExprExpr 0TermTerm 0Factoreof+-×÷()namenum——4—8———2—8———3—8—————6—————7—01—5—9——4—8—01—5—1101—5—10(b) The LL(1) Parse Table for Right-Recursive Expression Grammarn FIGURE 3.11 An LL(1) Parser for Expressions.3.3 Top-Down Parsing 113build FIRST, FOLLOW, and FIRST+ sets;for each nonterminal A do;for each terminal w do;Table[A ,w] ← error;end;for each production p of the form A → β do;for each terminal w ∈ FIRST+ (A → β) do;Table[A ,w] ← p;end;if eof ∈ FIRST+ (A → β)then Table[A ,eof] ← p;end;end;n FIGURE 3.12 LL(1) Table-Construction Algorithm.derivation, and use a lookahead of 1 symbol.
Grammars that work in an ll(1)scheme are often called ll(1) grammars. ll(1) grammars are, by definition,backtrack free.To build an ll(1) parser, the compiler writer provides a right-recursive,backtrack-free grammar and a parser generator constructs the actual parser.The most common implementation technique for an ll(1) parser generator uses a table-driven skeleton parser, such as the one shown at the top ofFigure 3.11.