1610906232-7ba7dbaea13262b50a6029d682cb7f1b (824370), страница 32
Текст из файла (страница 32)
There is again onlyone exception: we may have removed a rule 5 --+ a, thus omitting the stringa from the language generated by G. Once again, fortunately this omission isallowed by the statement of the theorem.154Chapter 3: CONTEXT-FREE LANGUAGESExample 3.6.1 (continued): In our modified grammar with ruleswe have D(Sd = {S1,)}, and D(A) = {A} for all A E V - {Sd. We omit alllength-one rules, of which there is only one, S1 -+). The only nonterminal witha nontrivial set '0, Sl, appears on the right-hand side of only the second rule.This rule is therefore replaced by the two rules S -+ (Sl, S -+ 0, correspondingto the two elements of D(Sd. The final grammar in Chomsky normal form isS -+ SS,S -+ (Sl,S1 -+ S),S -+o.After the three steps, the grammar is in Chomsky normal form, and, exceptfor the possible omission of strings of length less than two, it generates the samelanguage as the original one.In order to complete the proof of the theorem, we must establish that thewhole construction can be carried out in time polynomial in the size of theoriginal grammar G.
By "size of G" we mean the length of a string that sufficesto fully describe G -that is to say, the sum of the lengths of the rules of G.Let n be this quantity. The first part of the transformation (getting rid oflong rules) takes time O(n) and creates a grammar of size again O(n). Thesecond part, getting rid of e-rules, takes 0(n 2 ) time for the closure computation(O(n) iterations, each doable in O(n) time), plus O(n) for adding the new rules.Finally, the third part (taking care of short rules) can also be carried out inpolynomial time (O(n) closure computations, each taking time 0(n 2 )). Thiscompletes the proof of the theorem .•The advantage of Chomsky normal form is that it enables a simple polynomial algorithm for deciding whether a string can be generated by the grammar.Suppose that we are given a context-free grammar G = (V, ~,R, S) in Chomskynormal form, and we are asked whether the string x = Xl •.• X n , with n ~ 2, is inL(G).
The algorithm is shown below. It decides whether X E L(G) by analyzingall substrings of x. For each i and s such that 1 ::::: i ::::: i + s ::::: n, define N[i, i + s]to be the set of all symbols in V that can derive in G the string Xi··· Xi+8.The algorithm computes these sets.
It proceeds computing N[i, i + s] from shortstrings (s small) to longer and longer strings. This general philosophy of solvinga problem by starting from minuscule subproblems and building up solutionsto larger and larger subproblems until the whole problem is solved is known asdynamic programming.1553.6: Algorithms for Context-Free Grammarsfor i := 1 to n do N[i, i] := {xil; all other N[i, j] are initially emptyfor s := 1 to n - 1 dofor i := 1 to n - s dofor k := i to i + s - 1 doif there is a rule A --+ BC E R with B E N[i, k] and C E N[k + 1, ithen add A to N[i, i + s].Accept x if S E N[l, n].+ s]In order to establish that the algorithm above correctly determines whetherx E L( G), we shall prove the following claim.Claim: For each nat'ural number s with 0algorithm, for all i = 1, ... , n - s,N[i,i+ s] = {AS s S n, after the sth iteration of theE V: A::::}* Xi" 'Xi+S}'Proof of the Claim: The proof of this claim is by induction on s.Basis Step.
When s = 0 -where by "the zeroth iteration of the algorithm" weunderstand the first (initialization) line- the statement is true: since G is inChomsky normal form, the only symbol that can generate the terminal Xi is Xiitself.Induction Step: Suppose that the claim is true for all integers less than s > O.Consider a derivation of the substring Xi" . Xi+s, say from a nonterminal A.Since G is in Chomsky normal form, the derivation starts with a rule of theform A --+ BC, that is,where B, C E V.
Therefore, for some k with i< k S i + s,We conclude that A E {A E V : A ::::} * Xi'" Xi+s} if and only if there is aninteger k, i S k < i + s, and two symbols B E {A E V: A ::::}* Xi" .xd andC E {.4 E V : A ::::}* Xk+l ... Xi+s} such that A --+ BC E R. We can rewrite thestring Xi' .. Xk as Xi' .. Xi+s', where Sf = k - i, and the string Xk+l ...
Xi+s asXk+l ... Xk+l+s", where s" = i + s - k - 1. Notice that, since i S k < i + s, wemust have Sf, S" < s. Hence, the induction hypothesis applies!By the induction hypothesis, {A E V : A ::::}* Xi'" xd = N[i, k], and{A E V: A::::}* Xk+l" 'Xi+s} = N[k + 1,i + s]. We conclude that A E {A EV : A :=} * Xi'" Xi+s} if and only if there is an integer k, i S k < i + s, andtwo symbols B E N[i, k] and C E N[k + 1, i + s] such that A --+ BC E R.156Chapter 3: CONTEXT-FREE LANGUAGESBut these are precisely the circumstances under which our algorithm adds A toN[i, i + s].
Therefore the claim holds for s as well, and this concludes the proofof the induction hypothesis -and of the claim .•It follows immediately from the claim that the algorithm above correctlydecides whether x E L(G): At the end, the set N[l, n] will contain all symbolsthat derive the string Xl··· Xn = x. Therefore, X E L(G) if and only if S EN[l,n].To analyze the time performance of the algorithm, notice that it consists ofthree nested loops, each with a range bounded by Ixl = n. At the heart of theloop we must examine for each rule of the form A -+ BC whether B E N[i, j]and C E N[j + 1, i + s]; this can be carried out in time proportional to the sizeof the grammar G -the length of its rules. We conclude that the total numberof operations is O(lx1 3 1GI) -a polynomial in both the length of X and the size ofG.
For any fixed grammar G (that is, when we consider IGI to be a constant),the algorithm runs in time O(n 3 ) . •Example 3.6.1 (continued): Let us apply the dynamic programming algorithm to the grammar for the balanced parentheses, as was rendered in Chomskynormal form with rulesS -+ SS,S -+ (Sl,Sl -+ S),S -+o.Suppose we wish to tell whether the string (()(())) can be generated by G. Wedisplay in Figure 3.10 the values of N[i, i + s] for 1 SiS j S n = 8, resultingfrom the iterations of the algorithm.
The computation proceeds along paralleldiagonals of the table. The main diagonal, corresponding to s = 0, contains thestring being parsed. To fill a box, say [2,7], we look at all pairs of boxes of theform N[2, k] and N[k + 1,7] with 2 S k < 7. All these boxes lie either on theleft of or above the box being filled. For k = 3, we notice that S E N[2, 3],S E N[4, 7], and S -+ SS is a rule; thus we must add the left-hand side S to thebox N[2, 7].
And so on. The lower-right corner is N[l, n], and it does containS; therefore the string is indeed in L(G). In fact, by inspecting this table it iseasy to recover an actual derivation of the string (()(())) in G. The dynamicprogramming algorithm can be easily modified to produce such a derivation; seeProblem 3.6.2.0Part (c) of Theorem 3.6.1 now follows by combining Theorems 3.6.2 andthe claim above: Given a context-free grammar G and a string x, we determinewhether x E L(G) as follows: First, we transform G into an equivalent contextfree grammar G' in Chomsky normal form, according to the construction inthe proof of Theorem 3.6.2, in polynomial time.
In the special case in whichIxl S 1, we can already decide whether x E L(G): It is if and only if during1573.7: Determinism and Parsing81-)7)06)005(881 04(008813)000002(80008 81l(000000123458678Figure 3-10the transformation we had to delete a rule 8 --+ x. Otherwise, we run thedynamic programming algorithm described above for the grammar G' and thestring x.
The total number of operations used by the algorithm is bounded bya polynomial in the size of the original grammar G and the length of the stringx.•Problems for Section 3.63.6.1. Convert the context-free grammar G given in Example 3.1.3 generatingarithmetic expressions into an equivalent context-free grammar in Chomsky normal form. Apply the dynamic programming algorithm for decidingwhether the string x = (id + id + id) * (id) is in L(G).3.6.2. How would you modify the dynamic programming algorithm in such a waythat, when the input x is indeed in the language generated by G, then thealgorithm produces an actual derivation of x in G?3.6.3.
(a) Let G = (V,~, R, 8) be a context-free language. Call a nonterminalA E V - ~ productive if A ~G x for some x E ~'. Give a polynomialalgorithm for finding all productive nonterminals of G. (Hint: It is a closurealgorithm. )(b) Give a polynomial algorithm which, given a context-free grammar G,decides whether L(G) = 0.3.6.4. Describe an algorithm which, given a context-free grammar G, decideswhether L(G) is infinite. (Hint: One approach uses the Pumping Theorem.)What is the complexity of your algorithm? Can you find a polynomial-timealgorithm?158BChapter 3: CONTEXT-FREE LANGUAGESDETERMINISM AND PARSINGContext-free grammars are used extensively in modeling the syntax of programming languages, as was suggested by Example 3.1.3.
A compiler for such aprogramming language must then embody a parser, that is, an algorithm todetermine whether a given string is in the language generated by a given contextfree grammar, and, if so, to construct the parse tree of the string. (The compilerwould then go on to translate this parse tree into a program in a more basic language, such as assembly language.) The general context-free parser wehave developed in the previous section, the dynamic programming algorithm,although perfectly polynomial, is far too slow for handling programs with tensof thousands of instructions (recall its cubic dependence on the length of thestring).
Many approaches to the parsing problem have been developed by compiler designers over the past four decades. Interestingly, the most successfulones among them are rooted in the idea of a pushdown automaton. After all,the equivalence of pushdown automata and context-free grammars, which wasproved in Section 3.4, should be put to work. However, a pushdown automatonis not of immediate practical use in parsing, because it is a nondeterministic device. The question then arises, can we always make pushdown automata operatedeterministically (as we were able to do in the case of finite automata)?Our first objective in this section is to study the question of deterministicpushdown automata. We shall see that there are some context-free languagesthat cannot be accepted by deterministic pushdown automata. This is rather disappointing; it suggests that the conversion of grammars to automata in Section3.4 cannot be the basis for any practical method.