Cooper_Engineering_a_Compiler(Second Edition) (1157546), страница 33
Текст из файла (страница 33)
The parser generator constructs the table, Table, which codifies the parsing decisions and drives the skeleton parser. The bottom ofFigure 3.11 shows the ll(1) table for the right-recursive expression grammarshown in Figure 3.4 on page 101.In the skeleton parser, the variable focus holds the next grammar symbolon the partially built parse tree’s lower fringe that must be matched.
(focusplays the same role in Figure 3.2.) The parse table, Table, maps pairs ofnonterminals and lookahead symbols (terminals or eof) into productions.Given a nonterminal A and a lookahead symbol w, Table[A,w] specifiesthe correct expansion.The algorithm to build Table is straightforward. It assumes that first,follow, and first+ sets are available for the grammar. It iterates over thegrammar symbols and fills in Table, as shown in Figure 3.12. If the grammarmeets the backtrack free condition (see page 107), the construction will produce a correct table in O(|P| × |T |) time, where P is the set of productionsand T is the set of terminals.If the grammar is not backtrack free, the construction will assign more thanone production to some elements of Table.
If the construction assigns toParser generatora tool that builds a parser from specifications,usually a grammar in a BNF-like notationParser generators are also called compilercompilers.114 CHAPTER 3 ParsersRule—01511→82→511→6→11→84StackeofeofeofeofeofeofeofeofeofeofeofeofeofeofeofeofeofeofGoalExprExpr 0Expr 0Expr 0Expr 0Expr 0Expr 0Expr 0Expr 0Expr 0Expr 0Expr 0Expr 0Expr 0Expr 0Expr 0TermTerm 0 FactorTerm 0 nameTerm 0Term +TermTerm 0 FactorTerm 0 nameTerm 0Term 0 Factor xTerm 0 FactorTerm 0 nameTerm 0Inputnamenamenamenamenamename ↑name ↑name ↑name +name +name +name +name +name +name +name +name +name +↑↑↑↑↑++++++++namenamenamenamenamenamenamename↑ name↑ name↑ namename ↑name ↑name xname xname xname xname xxxxxxxxxxxxxxnamenamenamenamenamenamenamenamenamenamenamenamename↑ name↑ namename ↑name ↑name ↑n FIGURE 3.13 Actions of the LL(1) Parser on a + b x c.Table[A,w] multiple times, then two or more alternative right-hand sidesfor A have w in their first+ sets, violating the backtrack-free condition.The parser generator can detect this situation with a simple test on the twoassignments to Table.The example in Figure 3.13 shows the actions of the ll(1) expression parserfor the input string a + b x c.
The central column shows the contents of theparser’s stack, which holds the partially completed lower fringe of the parsetree. The parse concludes successfully when it pops Expr 0 from the stack,leaving eof exposed on the stack and eof as the next symbol, implicitly, inthe input stream.Now, consider the actions of the ll(1) parser on the illegal input stringx + ÷ y, shown in Figure 3.14 on page 115. It detects the syntax error whenit attempts to expand a Term with lookahead symbol ÷. Table[Term,÷]contains “—”, indicating a syntax error.Alternatively, an ll(1) parser generator could emit a direct-coded parser,in the style of the direct-coded scanners discussed in Chapter 2. Theparser generator would build first, follow, and first+ sets.
Next, itwould iterate through the grammar, following the same scheme used bythe table-construction algorithm in Figure 3.12. Rather than emitting tableentries, it would generate, for each nonterminal, a procedure to recognize3.3 Top-Down Parsing 115Rule—01511→82syntax errorat this point→StackeofeofeofeofeofeofeofeofGoalExprExpr 0Expr 0Expr 0Expr 0Expr 0Expr 0eof Expr 0TermTerm 0 FactorTerm 0 nameTerm 0Term +TermInputnamenamenamenamenamename ↑name ↑name ↑↑↑↑↑↑++++++++÷÷÷÷÷÷÷÷namenamenamenamenamenamenamenamename + ↑ ÷ namen FIGURE 3.14 Actions of the LL(1) Parser on x + ÷ y.each of the possible right-hand sides for that nonterminal. This processwould be guided by the first+ sets. It would have the same speed and locality advantages that accrue to direct-coded scanners and recursive-descentparsers, while retaining the advantages of a grammar-generated system, suchas a concise, high-level specification and reduced implementation effort.SECTION REVIEWPredictive parsers are simple, compact, and efficient.
They can beimplemented in a number of ways, including hand-coded, recursivedescent parsers and generated LL(1) parsers, either table driven or directcoded. Because these parsers know, at each point in the parse, the set ofwords that can occur as the next symbol in a valid input string, they canproduce accurate and useful error messages.Most programming-language constructs can be expressed in abacktrack-free grammar. Thus, these techniques have widespreadapplication. The restriction that alternate right-hand sides for anonterminal have disjoint FIRST+ sets does not seriously limit theutility of LL(1) grammars. As we will see in Section 3.5.4, the primarydrawback of top-down, predictive parsers lies in their inability to handleleft recursion.
Left-recursive grammars model the left-to-right associativity of expression operators in a more natural way than right-recursivegrammars.Review Questions1. To build an efficient top-down parser, the compiler writer must expressthe source language in a somewhat constrained form. Explain therestrictions on the source-language grammar that are required tomake it amenable to efficient top-down parsing.116 CHAPTER 3 Parsers2.
Name two potential advantages of a hand-coded recursive-descentparser over a generated, table-driven LL(1) parser, and two advantagesof the LL(1) parser over the recursive-descent implementation.3.4 BOTTOM-UP PARSINGBottom-up parsers build a parse tree starting from its leaves and workingtoward its root. The parser constructs a leaf node in the tree for each wordreturned by the scanner. These leaves form the lower fringe of the parsetree.
To build a derivation, the parser adds layers of nonterminals on topof the leaves in a structure dictated by both the grammar and the partiallycompleted lower portion of the parse tree.At any stage in the parse, the partially-completed parse tree represents thestate of the parse. Each word that the scanner has returned is represented by aleaf. The nodes above the leaves encode all of the knowledge that the parserhas yet derived. The parser works along the upper frontier of this partiallycompleted parse tree; that frontier corresponds to the current sentential formin the derivation being built by the parser.Handlea pair, hA→β,ki, such that β appears in thefrontier with its right end at position k andreplacing β with A is the next step in the parseReductionreducing the frontier of a bottom-up parser byA→β replaces β with A in the frontierTo extend the frontier upward, the parser looks in the current frontier for asubstring that matches the right-hand side of some production A → β.
If itfinds β in the frontier, with its right end at k, it can replace β with A, tocreate a new frontier. If replacing β with A at position k is the next step ina valid derivation for the input string, then the pair hA → β,ki is a handle inthe current derivation and the parser should replace β with A. This replacement is called a reduction because it reduces the number of symbols on thefrontier, unless |β| = 1. If the parser is building a parse tree, it builds a nodefor A, adds that node to the tree, and connects the nodes representing β asA’s children.Finding handles is the key issue that arises in bottom-up parsing.
Thetechniques presented in the following sections form a particularly efficienthandle-finding mechanism. We will return to this issue periodically throughout Section 3.4. First, however, we will finish our high-level description ofbottom-up parsers.The bottom-up parser repeats a simple process. It finds a handle hA → β,kion the frontier. It replaces the occurrence of β at k with A. This processcontinues until either: (1) it reduces the frontier to a single node that represents the grammar’s goal symbol, or (2) it cannot find a handle.
In the firstcase, the parser has found a derivation; if it has also consumed all the wordsin the input stream (i.e. the next word is eof), then the parse succeeds. In the3.4 Bottom-Up Parsing 117second case, the parser cannot build a derivation for the input stream and itshould report that failure.A successful parse runs through every step of the derivation. When a parsefails, the parser should use the context accumulated in the partial derivation to produce a meaningful error message. In many cases, the parser canrecover from the error and continue parsing so that it discovers as manysyntactic errors as possible in a single parse (see Section 3.5.1).The relationship between the derivation and the parse plays a critical role inmaking bottom-up parsing both correct and efficient.
The bottom-up parserworks from the final sentence toward the goal symbol, while a derivationstarts at the goal symbol and works toward the final sentence. The parser,then, discovers the steps of the derivation in reverse order. For a derivation:Goal = γ0 → γ1 → γ2 → · · · → γn−1 → γn = sentence,the bottom-up parser discovers γi → γi+1 before it discovers γi−1 → γi .