Cooper_Engineering_a_Compiler(Second Edition) (1157546), страница 35
Текст из файла (страница 35)
The parser finds a handle in each of iterations 3, 4, and 5. In iteration 3, the frontier of ( ) clearly matches the right-hand side of production 5.From the Action table, we see that a lookahead of either eof or ( impliesa reduce by production 5. Then, in iteration 4, the parser recognizes thatPair, followed by a lookahead of either eof or ( constitutes a handle for thereduction by List → Pair. The final handle of the parse, List with lookaheadof eof in state 1, triggers the accept action.To understand how the states preserved on the stack change the parser’sbehavior, consider the parser’s actions on our second input string,“( ( ) ) ( ),” as shown in Figure 3.17. Initially, the parser shifts (, (, and )onto the stack, in iterations 1 to 3. In iteration 4, the parser reduces byproduction 5; it replaces the top two symbols on the stack, ( and ), withPair and moves to state 5.Between these two examples, the parser recognized the string ( ) at stacktopas a handle three times.
It behaved differently in each case, based on the priorleft context encoded in the stack. Comparing these three situations exposeshow the stacked states control the future direction of the parse.With the first example, ( ), the parser was in s7 with a lookahead ofeof when it found the handle. The reduction reveals s0 beneath ( ), andGoto[s0 ,Pair ] is s2 . In s2 , a lookahead of eof leads to another reductionfollowed by an accept action. A lookahead of ) in s2 produces an error.122 CHAPTER 3 Parsers2.List10.(()?Pair((4.((5.(3.( UA( UA)Pair?Pair)Pair( AUPair)( AUPair())ListList)Pair?Pair)(List8.) PPqP)PP ?)qPAU AU)12.((PPP ?)qPair7.)List11.(6.Pair()Pair(PPq ?)P(PPP ?)qPair( AU AU)))?Pair(PP ?)qPPair(AUGoal13.)? PPP)qList)ListList9.(Pair?Pair(PP ?)qPPair(AU()Pair?PPP ?)qPair( AU)n FIGURE 3.18 The Sequence of Partial Parse Trees Built for (( ))( ).)() AU)3.4 Bottom-Up Parsing 123The second example, ( ( ) ) ( ), encounters a handle for ( ) twice.
Thefirst handle occurs in iteration 4. The parser is in s10 with a lookahead of ).It has previously shifted (, (, and ) onto the stack. The Action table indicates “r 5,” so the parser reduces by Pair → ( ). The reduction reveals s3beneath ( ) and Goto[s3 ,Pair] is s5 , a state in which further )’s are legal.The second time it finds ( ) as a handle occurs in iteration 10.
The reductionreveals s1 beneath ( ) and takes the parser to s4 . In s4 , a lookahead of eithereof or ( triggers a reduction of List Pair to List, while a lookahead of ) isan error.The Action and Goto tables, along with the stack, cause the parser to trackprior left context and let it take different actions based on that context. Thus,the parser handles correctly each of the three instances in which it found ahandle for ( ).
We will revisit this issue when we examine the constructionof Action and Goto.Parsing an Erroneous Input StringTo see how an lr(1) parser discovers a syntax error, consider the sequenceof actions that it takes on the string “( ) )”, shown below:IterationStatewordStackHandleActioninitial123—037(())$0$ 0$ 0 ( 3$ 0 ( 3 ) 7— none —— none —— none —— none ——shift 3shift 7errorThe first two iterations of the parse proceed as in the first example, “( )”.The parser shifts ( and ). In the third iteration of the while loop, it looks atthe Action table entry for state 7 and ). That entry contains neither shift,reduce, nor accept, so the parser interprets it as an error.The lr(1) parser detects syntax errors through a simple mechanism: thecorresponding table entry is invalid.
The parser detects the error as soonas possible, before reading any words beyond those needed to prove theinput erroneous. This property allows the parser to localize the error to aspecific point in the input. Using the available context and knowledge ofthe grammar, we can build lr(1) parsers that provide good diagnostic errormessages.Using LR ParsersThe key to lr parsing lies in the construction of the Action and Goto tables.The tables encode all of the legal reduction sequences that can arise in a124 CHAPTER 3 Parsersreverse rightmost derivation for the given grammar. While the number ofsuch sequences is huge, the grammar itself constrains the order in whichreductions can occur.The compiler writer can build Action and Goto tables by hand.
However,the table-construction algorithm requires scrupulous bookkeeping; it is aprime example of the kind of task that should be automated and relegatedto a computer. Programs that automate this construction are widely available. The next section presents one algorithm that can be used to constructlr(1) parse tables.With an lr(1) parser generator, the compiler writer’s role is to define thegrammar and to ensure that the grammar has the lr(1) property. In practice,the lr(1) table generator identifies those productions that are ambiguous orthat are expressed in a way that requires more than one word of lookaheadto distinguish between a shift action and a reduce action.
As we study thetable-construction algorithm, we will see how those problems arise, how tocure them, and how to understand the kinds of diagnostic information thatlr(1) parser generators produce.Using More LookaheadThe ideas that underlie lr(1) parsers actually define a family of parsers thatvary in the amount of lookahead that they use. An lr(k) parser uses, atmost, k lookahead symbols.
Additional lookahead allows an lr(2) parserto recognize a larger set of grammars than an lr(1) parsing system. Almostparadoxically, however, the added lookahead does not increase the set oflanguages that these parsers can recognize. lr(1) parsers accept the same setof languages as lr(k) parsers for k > 1. The lr(1) grammar for a languagemay be more complex than an lr(k) grammar.3.4.2 Building LR(1) TablesTo construct Action and Goto tables, an lr(1) parser generator builds amodel of the handle-recognizing automaton and uses that model to fill inthe tables. The model, called the canonical collection of sets of lr(1) items,represents all of the possible states of the parser and the transitions betweenthose states.
It is reminiscent of the subset construction from Section 2.4.3.To illustrate the table-construction algorithm, we will use two examples.The first is the parentheses grammar given in Figure 3.16a. It is smallenough to use as a running example, but large enough to exhibit some ofthe complexities of the process.3.4 Bottom-Up Parsing 1251 Goal → List2 List → List Pair3| Pair4 Pair → ( Pair )5| ( )Our second example, in Section 3.4.3, is an abstracted version of the classic if-then-else ambiguity. The table construction fails on this grammarbecause of its ambiguity. The example highlights the situations that lead tofailures in the table-construction process.LR(1) ItemsIn an lr(1) parser, the Action and Goto tables encode information about thepotential handles at each step in the parse.
The table-construction algorithm,therefore, needs a concrete representation for both handles and potential handles, and their associated lookahead symbols. We represent each potentialhandle with an lr(1) item. An lr(1) item [A→β • γ , a] consists of a production A → βγ ; a placeholder, •, that indicates the position of the stacktopin the production’s right-hand side; and a specific terminal symbol, a, as alookahead symbol.The table-construction algorithm uses lr(1) items to build a model of thesets of valid states for the parser, the canonical collection of sets of lr(1)items. We designate the canonical collection CC = {cc0 , cc1 , cc2 , . .
. , ccn }.The algorithm builds CC by following possible derivations in the grammar;in the final collection, each set cci in CC contains the set of potential handles in some possible parser configuration. Before we delve into the tableconstruction, further explanation of lr(1) items is needed.For a production A→βγ and a lookahead symbol a, the placeholder cangenerate three distinct items, each with its own interpretation. In each case,the presence of the item in some set cci in the canonical collection indicatesinput that the parser has seen is consistent with the occurrence of an A followed by an a in the grammar. The position of • in the item distinguishesbetween the three cases.1. [A→•βγ ,a] indicates that an A would be valid and that recognizing a βnext would be one step toward discovering an A.
We call such an item apossibility, because it represents a possible completion for the inputalready seen.2. [A→β • γ ,a] indicates that the parser has progressed from the state[A→•βγ ,a] by recognizing β. The β is consistent with recognizingLR(1) item[A→β • γ , a] where A→βγ is a grammarproduction, • represents the position of theparser’s stacktop, and a is a terminal symbol inthe grammar126 CHAPTER 3 Parsers[Goal → • List,eof][Goal → List •,eof][List → • List Pair,eof] [List → • List Pair,( ][List → List • Pair,eof] [List → List • Pair,( ][List → List Pair •,eof] [List → List Pair •,( ][List → • Pair,eof ][List → Pair •,eof ][List → • Pair,( ][List → Pair •,( ][Pair → • ( Pair ),eof ][Pair → ( • Pair ),eof ][Pair → ( Pair • ),eof ][Pair → ( Pair ) •,eof ][Pair → • ( Pair ),)][Pair → ( • Pair ),)][Pair → ( Pair • ),)][Pair → ( Pair ) •,)][Pair → • ( Pair ),(][Pair → ( • Pair ),(][Pair → ( Pair • ),(][Pair → ( Pair ) •,(][Pair → • ( ),eof][Pair → ( • ),eof][Pair → ( ) •,eof][Pair → • ( ),(][Pair → ( • ),(][Pair → ( ) •,(][Pair → • ( ),)][Pair → ( • ),)][Pair → ( ) •,)]n FIGURE 3.19 LR(1) Items for the Parentheses Grammar.an A.
One valid next step would be to recognize a γ . We call such anitem partially complete.3. [A→βγ •,a] indicates that the parser has found βγ in a context wherean A followed by an a would be valid. If the lookahead symbol is a,then the item is a handle and the parser can reduce βγ to A. Such anitem is complete.In an lr(1) item, the • encodes some local left context—the portions ofthe production already recognized. (Recall, from the earlier examples, thatthe states pushed onto the stack encode a summary of the context to theleft of the current lr(1) item—in essence, the history of the parse so far.)The lookahead symbol encodes one symbol of legal right context. When theparser finds itself in a state that includes [A→βγ •,a] with a lookahead of a,it has a handle and should reduce βγ to A.Figure 3.19 shows the complete set of lr(1) items generated by theparentheses grammar.
Two items deserve particular notice. The first,[Goal → • List,eof], represents the initial state of the parser—looking fora string that reduces to Goal, followed by eof. Every parse begins in thisstate. The second, [Goal → List •,eof], represents the desired final state ofthe parser—finding a string that reduces to Goal, followed by eof. Thisitem represents every successful parse.