K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (798440), страница 38
Текст из файла (страница 38)
The set of handlesis precisely the set of complete lr(1) items—those with the placeholder •at the right end of the item’s production. Any language with a finite set ofsentences can be recognized by a dfa. Since the number of productions andthe number of lookahead symbols are both finite, the number of completeitems is finite, and the language of handles is a regular language.When the lr(1) parser executes, it interleaves two kinds of actions: shiftsand reduces. The shift actions simulate steps in the handle-finding dfa.
The) Pair - cc4- cc8cc1cc5 @ (List Pair (@ ?(R@Pair- cc3 (- cc6- cc9 )- cc11cc0@@@)@)@Pair @ R@R@R@cc2cc7cc10 n FIGURE 3.25 Handle-Finding DFA for the Parentheses Grammar.The LR(1) parser makes the handle’s positionimplicit, at stacktop. This design decisiondrastically reduces the number of possiblehandles.136 CHAPTER 3 Parsersparser performs one shift action per word in the input stream. When thehandle-finding dfa reaches a final state, the lr(1) parser performs a reduceaction. The reduce actions reset the state of the handle-finding dfa to reflectthe fact that the parser has recognized a handle and replaced it with a nonterminal. To accomplish this, the parser pops the handle and its state offthe stack, revealing an older state.
The parser uses that older state, the lookahead symbol, and the Goto table to discover the state in the dfa from whichhandle-finding should continue.The reduce actions tie together successive handle-finding phases. The reduction uses left context—the state revealed by the reduction summarizes theprior history of the parse—to restart the handle-finding dfa in a state thatreflects the nonterminal that the parser just recognized. For example, in theparse of “( ( ) ) ( )”, the parser stacked an instance of state 3 for every( that it encounters. These stacked states allow the algorithm to match upthe opening and closing parentheses.Notice that the handle-finding dfa has transitions on both terminal and nonterminal symbols.
The parser traverses the nonterminal edges only on areduce action. Each of these transitions, shown in gray in Figure 3.25, corresponds to a valid entry in the Goto table. The combined effect of the terminaland nonterminal actions is to invoke the dfa recursively each time it mustrecognize a nonterminal.3.4.3 Errors in the Table ConstructionAs a second example of the lr(1) table construction, consider the ambiguous grammar for the classic if-then-else construct. Abstracting awaythe details of the controlling expression and all other statements (by treating them as terminal symbols) produces the following four-productiongrammar:1 Goal2 Stmt34→→||Stmtif expr then Stmtif expr then Stmt else StmtassignIt has two nonterminal symbols, Goal and Stmt, and six terminal symbols,if, expr, then, else, assign, and the implicit eof.The construction begins by initializing cc0 to the item [Goal →• Stmt, eof ] and taking its closure to produce the first set.3.4 Bottom-Up Parsing 1370123456789ItemGoalStmtifexprthenelseassigneofcc0cc1cc2cc3cc4cc5cc6cc7cc8cc9cc10cc11cc12cc13cc14cc15∅cc1cc2∅∅∅cc3∅∅∅∅∅∅∅∅∅∅∅cc4∅∅∅∅∅∅∅∅∅∅∅∅∅∅∅∅∅cc5∅∅∅∅cc6cc7∅∅∅cc8∅∅∅∅∅∅∅∅∅∅∅cc9∅∅∅∅∅∅∅∅∅∅∅∅cc11cc2cc12∅∅cc3∅∅∅∅∅∅∅∅∅cc10∅∅∅∅cc13cc7∅∅∅∅∅∅∅cc8∅∅∅∅∅∅∅∅∅cc14∅∅cc15cc7∅∅∅cc8∅∅∅∅∅∅∅∅∅n FIGURE 3.26 Trace of the LR(1) Construction on the If-Then-Else Grammar.(cc0 =[Goal → • Stmt, eof ][Stmt → • assign, eof ][Stmt → • if expr then Stmt, eof ][Stmt → • if expr then Stmt else Stmt, eof ]From this set, the construction begins deriving the remaining members ofthe canonical collection of sets of lr(1) items.Figure 3.26 shows the progress of the construction.
The first iteration examines the transitions out of cc0 for each grammar symbol. It produces threenew sets for the canonical collection from cc0 : cc1 for Stmt, cc2 for if, andcc3 for assign. These sets are:ncc1 = [Goal → Stmt •, eof ]o([Stmt → if • expr then Stmt, eof ],[Stmt → if • expr then Stmt else Stmt, eof ]nocc3 = [Stmt → assign •, eof ])cc2 =The second iteration examines transitions out of these three new sets.Only one combination produces a new set, looking at cc2 with the symbolexpr.(cc4 =[Stmt → if expr • then Stmt, eof],[Stmt → if expr • then Stmt else Stmt, eof]))138 CHAPTER 3 ParsersThe next iteration computes transitions from cc4 ; it creates cc5 asgoto(cc4 ,then).[Stmt → if expr then • Stmt, eof ],[Stmt → if expr then • Stmt else Stmt, eof ],cc5 = [Stmt → • if expr then Stmt, {eof, else}],[Stmt → • assign, {eof, else}],[Stmt → • if expr then Stmt else Stmt, {eof, else}]The fourth iteration examines transitions out of cc5 .
It creates new sets forStmt, for if, and for assign.(cc6 =[Stmt → if expr then Stmt •, eof ],[Stmt → if expr then Stmt • else Stmt, eof ])([Stmt → if • expr then Stmt,{eof, else}],cc7 =[Stmt → if • expr then Stmt else Stmt, {eof, else}])cc8 = {[Stmt → assign •, {eof, else}]}The fifth iteration examines cc6 , cc7 , and cc8 . While most of the combinations produce the empty set, two combinations lead to new sets. Thetransition on else from cc6 leads to cc9 , and the transition on expr fromcc7 creates cc10 .[Stmt → if expr then Stmt else • Stmt, eof ],[Stmt → • if expr then Stmt, eof ],cc9 =[Stmt → • if expr then Stmt else Stmt, eof ],[Stmt → • assign, eof ]()[Stmt → if expr • then Stmt, {eof, else}],cc10 =[Stmt → if expr • then Stmt else Stmt, {eof, else}]When the sixth iteration examines the sets produced in the fifth iteration, itcreates two new sets, cc11 from cc9 on Stmt and cc12 from cc10 on then.
Italso creates duplicate sets for cc2 and cc3 from cc9 .cc11 = {[Stmt → if expr then Stmt else Stmt •, eof ]}cc12[Stmt → if expr then • Stmt, {eof, else}],[Stmt → if expr then • Stmt else Stmt, {eof, else}],= [Stmt → • if expr then Stmt, {eof, else}],[Stmt → • if expr then Stmt else Stmt, {eof, else}], [Stmt → • assign, {eof, else}]3.4 Bottom-Up Parsing 139Iteration seven creates cc13 from cc12 on Stmt. It recreates cc7 and cc8 .(cc13 =)[Stmt → if expr then Stmt • , {eof, else}],[Stmt → if expr then Stmt • else Stmt, {eof, else}]Iteration eight finds one new set, cc14 from cc13 on the transition for else.cc14[Stmt → if expr then Stmt else • Stmt, {eof, else}],[Stmt → • if expr then Stmt, {eof, else}],=[Stmt → • if expr then Stmt else Stmt, {eof, else}],[Stmt → • assign, {eof, else}]Iteration nine generates cc15 from cc14 on the transition for Stmt, along withduplicates of cc7 and cc8 .cc15 = {[Stmt → if expr then Stmt else Stmt •, {eof, else}]}The final iteration looks at cc15 .
Since the • lies at the end of every itemin cc15 , it can only generate empty sets. At this point, no additional sets ofitems can be added to the canonical collection, so the algorithm has reacheda fixed point. It halts.The ambiguity in the grammar becomes apparent during the table-fillingalgorithm. The items in states cc0 through cc12 generate no conflicts. Statecc13 contains four items:1.2.3.4.[Stmt → if[Stmt → if[Stmt → if[Stmt → ifexpr then Stmt • , else]expr then Stmt • , eof ]expr then Stmt • else Stmt, else]expr then Stmt • else Stmt, eof ]Item 1 generates a reduce entry for cc13 and the lookahead else. Item 3generates a shift entry for the same location in the table.
Clearly, the tableentry cannot hold both actions. This shift-reduce conflict indicates that thegrammar is ambiguous. Items 2 and 4 generate a similar shift-reduce conflictwith a lookahead of eof. When the table-filling algorithm encounters sucha conflict, the construction has failed. The table generator should report theproblem—a fundamental ambiguity between the productions in the specificlr(1) items—to the compiler writer.In this case, the conflict arises because production 2 in the grammar is aprefix of production 3.
The table generator could be designed to resolve thisconflict in favor of shifting; that forces the parser to recognize the longerproduction and binds the else to the innermost if.A typical error message from a parser generatorincludes the LR(1) items that generate theconflict; another reason to study the tableconstruction.140 CHAPTER 3 ParsersAn ambiguous grammar can also produce a reduce-reduce conflict. Sucha conflict can occur if the grammar contains two productions A→γ δ andB →γ δ, with the same right-hand side γ δ. If a state contains the items[A→γ δ •,a] and [B →γ δ •,a], then it will generate two conflicting reduceactions for the lookahead a—one for each production.
Again, this conflictreflects a fundamental ambiguity in the underlying grammar; the compilerwriter must reshape the grammar to eliminate it (see Section 3.5.3).Since parser generators that automate this process are widely available, themethod of choice for determining whether a grammar has the lr(1) propertyis to invoke an lr(1) parser generator on it. If the process succeeds, thegrammar has the lr(1) property.Exercise 12 shows an LR(1) grammar that has noequivalent LL(1) grammar.As a final example, the LR tables for the classicexpression grammar appear in Figures 3.31and 3.32 on pages 151 and 152.SECTION REVIEWLR(1) parsers are widely used in compilers built in both industry andacademia. These parsers accept a large class of languages.
They usetime proportional to the size of the derivation that they construct. Toolsthat generate an LR(1) parser are widely available in a broad variety ofimplementation languages.The LR(1) table-construction algorithm is an elegant application of theoryto practice. It systematically builds up a model of the handle-recognizingDFA and then translates that model into a pair of tables that drive theskeleton parser.
The table construction is a complex undertaking thatrequires painstaking attention to detail. It is precisely the kind of task thatshould be automated—parser generators are better at following theselong chains of computations than are humans. That notwithstanding,a skilled compiler writer should understand the table-constructionalgorithms because they provide insight into how the parsers work, whatkinds of errors the parser generator can encounter, how those errorsarise, and how they can be remedied.Review Questions1. Show the steps that the skeleton LR(1) parser, with the tables for theparentheses grammar, would take on the input string “( ( ) ( ) ) ( ) .”2. Build the LR(1) tables for the SheepNoise grammar, given inSection 3.2.2 on page 86, and show the skeleton parser’s actions onthe input “baa baa baa.”3.5 Practical Issues 1413.5 PRACTICAL ISSUESEven with automatic parser generators, the compiler writer must manageseveral issues to produce a robust, efficient parser for a real programminglanguage.
This section addresses several issues that arise in practice.3.5.1 Error RecoveryProgrammers often compile code that contains syntax errors. In fact, compilers are widely accepted as the fastest way to discover such errors. In thisapplication, the compiler must find as many syntax errors as possible in asingle attempt at parsing the code. This requires attention to the parser’sbehavior in error states.All of the parsers shown in this chapter have the same behavior when theyencounter a syntax error: they report the problem and halt. This behaviorprevents the compiler from wasting time trying to translate an incorrect program.
However, it ensures that the compiler finds at most one syntax errorper compilation. Such a compiler would make finding all the syntax errorsin a file of program text a potentially long and painful process.A parser should find as many syntax errors as possible in each compilation.This requires a mechanism that lets the parser recover from an error by moving to a state where it can continue parsing. A common way of achieving thisis to select one or more words that the parser can use to synchronize the inputwith its internal state.