1610906232-7ba7dbaea13262b50a6029d682cb7f1b (824370), страница 34
Текст из файла (страница 34)
However,our style here is rather different from that of the rest of this book; there arefewer proofs, and we do not attempt to tie up all the loose ends of the ideas weintroduce. We present some guidelines ~-what we call "heuristic rules" - thatwill not be useful in all cases, and we do not even attempt to specify exactly whenthey will be useful. That is, we aim to introduce some suggestive applicationsof the theory developed earlier in this chapter, but this venture should not betaken as anything more than an introduction.Let us begin with an example.
The language L = {anb n } is generated bythe context-free grammar G = ({ a, b, S}, {a, b}, R, S), where R contains the tworules S --+ aSb and S --+ e. We know how to construct a pushdown automatonthat accepts L: just carry out the construction of Lemma 3.4.1 for the grammarG. The result isM1 =({p,q},{a,b},{a,b,S},~1,P,{q}),where~1=((p, e, c), (q, S)), ((q, e, S), (q, aSb)),((q, e, S), (q, e)),((q, a, a), (q, e)), ((q, b, b), (q, e))}.Since M1 has two different transitions with identical first components -the onescorresponding to the two rules of G that have identical left-hand sides- it isnot deterministic.Nevertheless, L is a deterministic context-free language, and M1 can bemodified to become a deterministic pushdown automaton M2 that accepts L$.Intuitively, all the information that M1 needs at each point in order to decidewhich of the two transitions to follow is the next input symbol.
If that symbolis an a, then M1 should replace S by aSb on its stack if hope of an acceptingcomputation is to be retained. On the other hand, if the next input symbol isa b, then the machine must pop S. M2 achieves this required anticipation orlookahead by consuming an input symbol ahead of time and incorporating thatinformation into its state. Formally,where ~2 contains the following transitions.1633.7: Determinism and Parsing(1)(2)(3)(4)(5)(6)(7)(8)((p, e, e), (q, S))((q,a,e), (qa,e))((qa,e,a),(q,e))((q, b, e), (qb" e))((qb,e,b), (q,e))((q,$,e), (q$, e))((qa, e, S), (qa, aSb))((qb,e,S),(qb,e))From state q, ~M2 reads one input symbol and, without changing the stack,enters one of the three new states qa, qb, or q$.
It then uses that informationto differentiate between the two compatible transitions ((q, e, S), (q, aSb)) and(( q, e, S), (q, e) ): The first transition is retained only from state qa and the secondonly from state qb. So M2 is deterministic. It accepts the input ab$ a'i follows.Step012345678StateUnread InputStackpqqaqaqqbqbqq$ab$ab$b$b$b$eSSaSbSbSbbee$$$eTransition UsedRule of G-12734856S --+ aSbS--+eSo M2 can serve as a deterministic device for recognizing strings of theform anb n . Moreover, by remembering which transitions of M2 were derivedfrom which rules of the grammar (this is the last column of the table above),we can use a trace of the operation of M2 in order to reconstruct a leftmostderivation of the input string.
Specifically, the steps in the computation wherea nonterminal is replaced on top of the stack (Steps 3 and 6 in the example)correspond to the construction of a parse tree from the root towards the leaves(see Figure 3-1l(a)).Devices such as M 2 , which correctly decide whether a string belongs in acontext-free language, and, in the case of a positive answer, produce the corresponding parse tree are called parsers.
In particular, M2 is a top-down parserbecause tracing its operation at the steps where nonterminals are replaced onthe stack reconstructs a parse tree in a top-down, left-to-right fashion (see Figure3-1l(b) for a suggestive way of representing how progress is made in a top-downparser). We shall see a more substantial example shortly.Naturally, not all context-free languages have deterministic acceptors thatcan he derived from the standard nondeterministic one via the lookahead idea.For example, we saw in the previous subsection that some context-free languagesare not deterministic to begin with.
Even for certain deterministic context-freelanguages, lookahead of just one symbol may not be sufficient to resolve alluncertainties. Some languages, however, are not directly amenable to parsingby lookahead for reasons that are superficial and can he removed by slightlymodifying the grammar. We shall focus on these next.Recall the grammar G that generates arithmetic expressions with operations+ and * (Example 3.1.3). In fact, let us enrich this grammar by another rule,F -+ id(E),(R7)designed to allow Junction calls -such as sqrt(x * x + 1) and J(z)- to appearin our arithmetic expressions.Let us try to construct a top-down parser for this grammar.
Our construction of Section 3.4 would give the pushdown automatonM3 =with({p,q},~,r,~,p,{q}),~={(,), +, *, id},r=~U{E,T,F},and~as given below.(0) ((p,e,e),(q,E))(1) ((q, e, E), (q, E + T))(2) ((q, e, E), (q, T))3.7: Determinism and Parsing165(3) ((q, e, T), (q, T * F))(4) ((q, e, T), (q, F))(5) ((q,e,F),(q,(E)))(6) ((q, e, F), (q, id))(7) ((q, e, F), (q, id(E)))Finally, (( q, a, a), (q, e)) E ~ for all a E ~.
The nondeterminism of M3 ismanifested by the sets of transitions 1-2,3-4, and 5-6-7 that have identical firstcomponents. R'hat is worse, these decisions cannot be made based on the nextinput symbol. Lets us examine more closely why this is so.Transitions 6 and 7. Suppose that the configuration of J..h is (q, id, F). Atthis point M3 could act according to anyone of transitions 5, 6, or 7. Bylooking at the next input symbol -id- M3 could exclude transition 5, sincethis transition requires that the next symbol be (.
Still, M3 would not be ableto decide between transitions 6 and 7, since they both produce a top of the stackthat can be matched to the next input symbol -id. The problem arises becausethe rules F --+ id and F --+ id(E) of G have not only identical left-hand sides,but also the same first symbol on their right-hand sides.There is a very simple way around this problem: Just replace the rulesF --+ id and F --+ id(E) in G by the rules F --+ idA, A --+ e, and A --+ (E), whereA is a new nonterminal (A for argument). This has the effect of "procrastinating"on the decision between the rules F --+ id and F --+ id(E) until all neededinformation is available.
A modified pushdown automaton M~ now results fromthis modified grammar, in which transitions 6 and 7 are replaced by the following.(6') ((q,e,F),(q,idA))(7') ((q,e,A),(q,e))(8') ((q,e,A), (q, (E)))Now looking one symbol ahead is enough to decide the correct action.For example, configuration (q, id(id), F) would yield (q, id(id), idA), (q, (id), A),(q, (id), (E)), and so on.This technique of aVOiding nondeterminism is known as left factoring. Itcan be summarized as follows.Heuristic Rule 1: Whenever A --+ af31' A --+ a/h, ... , A --+Ci/3n are rules witha f:. e and n :::: 2, then replace them by the rules A --+ aA' and A' --+ f3i fori = 1, ...
, n, where A' is a new nonterminal.It is easy to see that applying Heuristic Rule 1 does not change the languagegenerated by the grammar.We now move to examining the second kind of anomaly that prevents usfrom transforming ]}h into a deterministic parser.166Chapter 3: CONTEXT-FREE LANGUAGESTransitions 1 and 2. These transitions present us with a more serious problem. Ifthe automaton sees id as the next input symbol and the contents of the stack arejust E, it could take a number of actions. It could perform transition 2, replacingE by T (this would be justified in case the input is, say, id).
Or it could replaceE by E + T (transition 1) and then the top E by T (this should be done if theinput is id + id). Or it could perform transition 1 twice and transition 2 once(input id + id + id), and so on. It seems that there is no bound whatsoever on howfar ahead the automaton must peek in order to decide on the right action. Theculprit here is the rule E --+ E + T, in which the nonterminal on the left-handside is repeated as the first symbol of the right-hand side. This phenomenonis called left recursion, and can be removed by some further surgery on thegrammar.To remove left recursion from the rule E --+ E + T, we simply replace it bythe rules E --+ T E', E' --+ +T E', and E' --+ e, where E' is a new nonterminal. Itcan be shown that such transformations do not change the language produced bythe grammar.
The same method must also be applied to the other left recursiverule of G, namely T --+ T * F. We thus arrive at the grammar G' = (F',~, R', E)where V' = ~ U {E, E', T, T', F, A}, and the rules are as follows.(1)(2)(3)(4)(5)(6)(7)(8)(9)(10)E --+ T E'E' --+ +T E'E' --+ eT --+ FT'T' --+ *FT'T' --+ eF --+ (E)F --+ idAA --+ eA --+ (E)The above technique for removing left recursion from a context-free grammar can be expressed as follows. tHeuristic Rule 2: Let A --+ Aal, ... , A --+ Aa n and A --+ (31, ... , A --+ (3mbe all rules with A on the left-hand side, where the (3i's do not start with an Aand n > 0 (that is, there is at least one left-recursive rule).
Then replace theserules by A --+ (3lA', ... , A --+ (3m,A' and A' --+ alA', ... , A' --+ anA', and A' --+ e,where A' is a new nonterminal.Still the grammar G' of our example has rules with identical left-hand sides,only now all uncertainties can be resolved by looking ahead at the next inputtWe assume here that there are no rules of the form A --+ A.3.7: Determinism and Parsing167symbol. We can thus construct the following deterministic pushdown automatonM4 that accepts L(G)$.whereand~is listed below.((p,e,e),(q,E))((q, a, e), (qa, e))for each a E ~ U {$}for each a E ~((qa, e, a), (q, e))((qa, e, E), (qa, T E'))for each a E ~ U {$}((q+, e, E'), (q+, +TE'))((qa,e,E'),(qa,e))foreachaE O,$}((qa, e, T), (qa, FT'))for each a E ~ U {$}((q., e, T'), (q., *FT'))((qa, e, T'), (qa, e))for each a E {+,), $}((q(, e, F), (q(, (E)))((qid,e, F), (qid, idA))((q(, e, A), (q(, (E)))((qa,e,A), (qa,e))for each a E {+,*,),$}Then M4 is a parser for G'.