1610906232-7ba7dbaea13262b50a6029d682cb7f1b (824370), страница 35
Текст из файла (страница 35)
For example, the input string id * (id)$ would beaccepted as shown in the table in the next page.Here we have indicated the steps in the computation where a nonterminalhas been replaced on the stack in accordance with a rule of G'. By applyingthese rules of G' in the last column of this table in sequence, we obtain a leftmostderivation of the input string:E => T E' => FT' E' => idT' E' => id * FT' E' => id * (E)T' E' =>id * (T E')T' E' => id * (FT' E')T' E' => id * (idT' E')T' E' =>id * (idE')T' E' => id * (id)T' E' => id * (id)E' => id * (id)In fact, a parse tree of the input can be reconstructed (see Figure 3-12; the stepof the pushdown automaton corresponding to the expansion of each node of theparse tree is also shown next to the node).
Notice that this parser constructsthe parse tree of the input in a top-down, left-first manner, starting from Eand repeatedly applying an appropriate rule to the leftmost nonterminal.168Chapter 3: CONTEXT-FREE LANGUAGESStep01234567891011121314151617181920212223242526Statepqqidqidqidqidqq*q*q*qq(q(qqidqidqidqidqq)q)q)q)qq$q$q$Unread InputStackeid * (id)$Eid * (id)$E* (id) $TE'* (id)$FT'E'*(id)$idAT'E'*(id)$AT'E'*(id)$(id)$AT'E'(id)$T'E'(id)$*FT'E'(id)$FT'E'id)$FT'E'id)$(E)T'E'id)$E)T'E')$E)T'E')$TE')T'E'FT'E')T'E')$)$ idAT' E')T' E'AT'E')T'E')$$AT'E')T'E'T'E')T'E'$E')T'E'$)T'E'$$T'E'eT'E'E'eeeRule of G'1489571481063663In general, given a grammar G, one may try to construct a top-down parserfor G as follows: Eliminate left recursion in G by repeatedly applying HeuristicRule 2 to all left-recursive nonterminals A of G.
Apply Heuristic Rule 1 to leftfactor G whenever necessary. Then examine whether the resulting grammar hasthe property that one can decide among rules with the same left-hand side bylooking at the next input symbol. Grammars with this property are called LL(I).Although we have not specified exactly how to determine whether a grammar isindeed LL(l) -nor how to construct the corresponding deterministic parser ifit is LL(I)-- there are systematic methods for doing so. In any case, inspectionof the grammar and some experimentation will often be all that is needed.3.7: Determinism and Parsing169---------E (step 3)--------T (step 4)IT'(step 9)F (step 5)~idE' (step 26)e~*A (step 8)eT' (step 25)F~I(eE (step 15)~T (step 16)E' (step 22)~F(step 17)~idA (step 20)T' (step 21)IeIeIeFigure 3-12Bottom-Up ParsingThere is no one best way to parse context-free languages, and different methodsare sometimes preferable for different grammars.
We close this chapter by brieflyconsidering methods quite dissimilar from those of top-down parsing. Nevertheless they, too, find their genesis in the construction of a pushdown automaton.In addition to the construction of Lemma 3.4.1, there is a quite orthogonalway of constructing a pushdown automaton that accepts the language generatedby a given context-free grammar. The automata of that construction (fromwhich the top-down parsers studied in the last subsection are derived) operateby carrying out a leftmost derivation on the stack; as terminal symbols aregenerated, they are compared with the input string. In the construction givenbelow, the automaton attempts to read the input first and, on the basis of theinput actually read, deduce what derivation it should attempt to carry out.
Thegeneral effect, as we shall see, is to reconstruct a parse tree from the leaves tothe root, rather than the other way around, and so this class of methods is calledbottom-up.The bottom-up pushdown automaton is constructed as follows. Let G =(F,~, R, S) be any context-free grammar; then let M = (K. 4, .1, p, F). whereK=(p. q}. r = F, F = {q}, and ~ contains the following.170Chapter 3: CONTEXT-FREE LANGUAGES(1) ((p, a, e), (p, a)) for each a E ~.(2) ((p, e, oR), (p, A)) for each rule A -+(3) ((p, e, S), (q, e)).0in R.Before moving to the proof itself, compare these types of transitions withthose of the automaton constructed in the proof of Lemma 3.4.1. Transitionsof type 1 here move input symbols onto the stack; transitions of type 3 inLemma 3.4.
pop terminal symbols off the stack when they match input symbols.Transitions of type 2 here replace the right-hand side of a rule on the stack bythe corresponding left-hand side, the right-hand side being found reversed onthe stack; those of type 2 of Lemma 3.4.1 replace the left-hand side of a rule onthe stack by the corresponding right-hand side. Transitions of type 3 here enda computation by moving to the final state when only the start symbol remainson the stack; transitions of type 1 of Lemma 3.4.1 start off the computation byplacing the start symbol on the initially empty stack.
So the machine of thisconstruction is in a sense perfectly orthogonal to the one of Lemma 3.4.1.Lemma 3.7.1: Let G and M be as just presented. Then L(M) = L(G).Proof: . Any string in L( G) has a rightmost derivation from the start symbol.Therefore proof of the following claim suffices to establish the lemma.Claim: For any x E ~* and "( Er',(p,x,,,() f-~f (p,e,S) if and only if S J!,a"(Rx.For if we let x be an input to M and "( = e, then since q is the only final stateand it can be entered only via transition 3, the claim implies that M accepts xif and only if G generates x. The only if direction of the claim can be proved byan induction on the number of steps in the computation of M, whereas the ifdirection can be proved by an induction on the number of steps in the rightmostderivation of x from S .•Let us consider again the grammar for arithmetic expressions (Example3.1.3, without the rule F -+ id(E) of the previous subsection). The rules of thisgrammar are the following.E-+E+TE-+T(Rl)T-+T*FT-+FF -+ (E)F -+ id(R3)(R2)(R4)(R5)(R6)1713.7: Determinism and ParsingIf our new construction is applied to this grammar, the following set of transitionsis obtained.foreachaE~(p,a,e),(p,a))(~O)(p, e, T + E), (p, E)(p, e, T), (p, E)(p, e, F * T), (p, T)(p, e, F), (p, T)(p, e, )E(), (p, F)(~1)(~2)(~3)(~4)(~5)(p, e, id), (p, F)(~6)(p,e,E),(q,e)(~7)Let us call this pushdown automaton M.
The input id * (id) is accepted by Mas shown in the table below.Step01234567891011121314StateppppppppppppppqUnread InputStackeid * (id)id*(id)F*(id)T*(id)(id)*Tid)(*T) id(*T) F(*T) T(*T) E(*Te )E(*Te F*TeTEeeeTransition UsedRule of G~O~6~4R6R4~O~O~O~6~4~2R6R4R2~O~5~3~2R5R3R267We see that M is certainly not deterministic: Transitions of type ~O arecompatible with all the transitions of type ~1 through ~8. Still, its overall"philosophy" of operation is suggestive.
At any point, M may shift a terminalsymbol from its input to the top of the stack (transitions of type ~O, used inthe sample computation at Steps 1, 4, 5, 6, and 10). On the other hand, itmay occasionally recognize the top few symbols on the stack as (the reverseof) the right-hand side of a rule of G, and may then reduce this string to thecorresponding left-hand side (transitions of types ~2 through ~6, used in theChapter 3: CONTEXT-FREE LANGUAGES172sample computation where a "rule of G" is indicated in the rightmost column).The sequence of rules corresponding to the reduction steps turns out to mirrorexactly, in reverse order, a rightmost derivation of the input string. In ourexample, the implied rightmost derivation is as follows.E~T~T*F* (E)~ T * (T)~ T * (F)~ T * (id)~ F * (id)~ id * (id)~TThis derivation can be read from the computation by applying the rules mentioned in the right-hand column, from bottom to top, always to the rightmostnonterminal.
Equivalently, this process can be thought of as a bottom-to-top,left-to-right reconstruction of a parse tree (that is, exactly orthogonal to Figure3-11 (b».In order to construct a practically useful parser for L(G), we must turn Minto a deterministic device that accepts L( G)$. As in our treatment of top-downparsers, we shall not give a systematic procedure for doing so.
Instead, we carrythrough the example of G, pointing out the basic heuristic principles that governthis construction.First, we need a way of deciding between the two basic kinds of moves,namely, shifting the next input symbol to the stack and reducing the few topmost stack symbols to a single nonterminal according to a rule of the grammar.One possible way of deciding this is by looking at two pieces of information:the next input symbol -call it b~ and the top stack symbol --call it a. (Thesymbol a could be a nonterminal.) The decision between shifting and reducing isthen done through a relation P ~ V x (~U {$}) called a precedence relationP. If (a, b) E P, then we reduce; otherwise we shift b. The correct precedencerelation for the grammar of our example is given in the table below.
Intuitively,(a, b) E P means that there exists a rightmost derivation of the formSR*~Gf3AbxR~Gf3,),abx.Since we are reconstructing rightmost derivations backwards, it makes senseto undo the rule A --+ ')'a whenever we observe that a immediately precedesb. There are systematic ways to calculate precedence relations, as well as tofind out when, as is the case in this example, a precedence relation suffices to3.7: Determinism and Parsing173I(()id+*$v'v'v'v'v'v'v'v'v'v'v'v'v'v'v')id+*ETFdecide between shifting and reducing; however, in many cases inspection andexperimentation will lead to the right table.Now we must confront the other source of nondeterminism: when we decideto reduce, how do we choose which of the prefixes of the pushdown store toreplace with a nonterminal? For example, if the pushdown store contains thestring F * T + E and we must reduce, we have a choice between reducing F toT (Rule R4) or reducing F * T to T (Rule R3).