1610906232-7ba7dbaea13262b50a6029d682cb7f1b (824370), страница 26
Текст из файла (страница 26)
Show that the following languages are context-free by exhibiting contextfree grammars generating each.(a) {amb n : m::::: n}(b) {ambnd'd q : m + n = p + q}(c) {w E {a,b}*: w has twice as many b's as a's}(d) {uawb: u, wE {a, b}*, lui = Iwl}(e) WICW2C . .. cWkccwf : k ::::: 1, 1 ~ j ~ k, Wi E {a, b} + for i = 1, ... ,k}(f) {amb n : m ~ 2n}3.1.10. Call a context-free grammar G = (V,~, R, S) regular (or right-linear) ifR ~ (V - ~) x ~* (V - ~ U {e}); that is, if each transition has a righthand side that consists of a string of terminals followed by at most onenonterminal.(a) Consider the regular grammar G = (V,~, R, S), whereV = {a,b,A,B,S}~ ={a,b}R = {S -+ abA,S -+ B,S -+ baB,S -+ e,A -+ bS,B -+ as,A -+ b}.Construct a nondeterministic finite automaton M such that L(M) = L(G).Trace the transitions of M that lead to the acceptance of the string abba,and compare with a derivation of the same string in G.(b) Prove that a language is regular if and only if there is a regular grammarthat generates it.
(Hint: Recall Example 3.1.5.)(c) Call a context-free grammar G = (V,~, R, S) left-linear if and only ifR ~ (V - ~) x (V - ~) U {e} )~*. Show that a language is regular if andonly if it is the language generated by some left-linear grammar.(d) Suppose that G = (V,~, R, S) is a context-free grammar such that each rulein R is either of the form A -+ wB or of the form A -+ Bw or of the formA -+ w, where in each case A, BE V - ~ and w E ~*. Is L(G) necessarilyregular? Prove it or give a counter-example.liiJPARSE TREESLet G be a context-free grammar. A string w E L(G) may have many derivations in G.
For example, if G is the context-free grammar that generates thelanguage of balanced parentheses (recall Example 3.1.4), then the string 00 canbe derived from S by at least two distinct derivations, namely,S=}SS=}(S)S=}OS =} 0 (S)=}003.2: Parse Trees123andS::::} SS ::::} S(S) ::::} (S)(S) ::::} (S)O ::::}00However, these two derivations are in a sense "the same." The rules used arethe same, and they are applied at the same places in the intermediate string.The only difference is in the order in which the rules are applied. Intuitively,both derivations can be pictured as in Figure 3-2.-----SSS~~(S ) ( S)IIeeFigure 3-2We call such a picture a parse tree.
The points are called nodes; eachnode carries a label that is a symbol in V. The topmost node is called theroot, and the nodes along the bottom are called leaves. All leaves are labeledby terminals, or possibly the empty string e. By concatenating the labels ofthe leaves from left to right, we obtain the derived string of terminals, which iscalled the yield of the parse tree.More formally, for an arbitrary context-free grammar G = (V, 1:, R, S), wedefine its parse trees and their roots, leaves, and yields, as follows.1.oaThis is a parse tree for each a E 1:. The single node of this parse tree is boththe root and a leaf.
The yield of this parse tree is a.2. If A --+ e is a rule in R, then[is a parse tree; its root is the node labeled A, its sole leaf is the node labeled e,and its yield is e.124Chapter 3: CONTEXT-FREE LANGUAGES3. IfAre parse trees, where n > 1, with roots labeled AI, . .. ,An respectively, andwith yields YI, ... , Yn, and A -+ Al ... An is a rule in R, thenis a parse tree. Its root is the new node labeled A, its leaves are the leaves of itsconstituent parse trees, and its yield is YI ...
Yn.4. Nothing else is a parse tree.Example 3.2.1: Recall the grammar G that generates all arithmetic expressionsover id (Example 3.1.3). A parse tree with yield id * (id + id) is shown in Figure3-3.0Intuitively, parse trees are ways of representing derivations of strings inL(G) so that the superficial differences between derivations, owing to the orderof application of rules, are suppressed. To put it otherwise, parse trees representequivalence classes of derivations. We make this intuition precise below.Let G = (V, 2;, R, S) be a context-free grammar, and let D = Xl =} X2 =}...
=} Xn and D' = x~ =} x~ =} ... =} x~ be two derivations in G, whereXi, X~ E V* for i = 1, ... ,n, Xl, xi E V - 2;, and Xn, x~ E 2;*. That is, they areboth derivations of terminal strings from a single nonterminal. We say that Dprecedes D', written D -< D', if n > 2 and there is an integer k, 1 < k < nsuch that(1) for all i -:P k we have Xi = x~;(2) Xk-I = X~_I = uAvBw, where u, v, w E V*, and A, B, E V - 2;;1253.2: Parse TreesEIT~FI*idTIF~(E)~T + EIIFTIIFidIidFigure 3-3(3)(4)(5)= uyvBw, where A -+ y E R;= uAvzw where B -+ z E R;Xk+l = X~+l = uyvzw.Xkx~In other words, the two derivations are identical except for two consecutivesteps, during which the same two nonterminals are replaced by the same twostrings but in opposite orders in the two derivations. The derivation in whichthe leftmost of the two nonterminals is replaced first is said to precede the other.Example 3.2.2: Consider the following three derivations D1 , D2 , and D3 inthe grammar G generating all strings of balanced parentheses:Dl =8 :::} 88 :::} (8)8 :::} ((8))8 :::} (())8 :::} (())(8) :::} (())OD2 =8 :::} 88 :::} (8)8 :::} ((8))8 :::} ((8))(8) :::} (())(8) :::} (())OD3 =8 :::} 88 :::} (8)8 :::} ((8))8 :::} ((8))(8) :::} ((8))0 :::} (())OWe have that Dl -< D2 and D2 -< D 3.
However, it is not the case that Dl -< D3,since the two latter derivations differ in more than one intermediate string.Notice that all three derivations have the same parse tree, the one shown inFigure 3-4.0We say that two derivations D and D' are similar if the pair (D, D')belongs in the reflexive, symmetric, transitive closure of -<. Since the reflexive,126Chapter 3: CONTEXT-FREE LANGUAGES5~55~(~)5(~(55)I)eIeFigure 3-4symmetric, transitive closure of any relation is by definition reflexive, symmetric,and transitive, similarity is an equivalence relation.
To put it otherwise, twoderivations are similar if they can be transformed into another via a sequenceof "switchings" in the order in which rules are applied. Such a "switching" canreplace a derivation either by one that precedes it, or by one that it precedes.Example 3.2.2 (continued): Parse trees capture exactly, via a natural isomorphism, the equivalence classes of the "similarity" equivalence relation betweenderivations of a string defined above. The equivalence class of the derivations of(()) 0 corresponding to the tree in Figure 3-4 contains the derivations D1 , D2 , D3shown above, and also these seven:D4D5D6D7DBDgDlO=5 => 55 => (5)5 => (5)(5) => ((5))(5) => (())(5) => (())O=5 => 55 => (5)5 => (5)(5) => ((5))(5) => ((5))0 => (())O=5 => 55 => (5)5 => (5)(5) => (5)0 => ((5))0 => (())O=5 => 55 => 5(5) => (5)(5) => ((5))(5) => (())(5) => (())O=5 => 55 => 5(5) => (5)(5) => ((5))(5) => ((5))0 => (())O=5 => 55 => 5(5) => (5)(5) => (5)0 => ((5))0 => (())O=5 => 55 => 5(5) => 50 => (5)0 => ((5))0 => (())OThese ten derivations are related by -< as shown in Figure 3-5.Figure 3-51273.2: Parse TreesAll these ten derivations are similar, because, informally, they representapplications of the same rules at the same positions in the strings, only differingin the relative order of these applications; equivalently, one can go from anyoneof them to any other by repeatedly following either a -<, or an inverted -<.
Thereare no other derivations similar to these.There are, however, other derivations of (())() that are not similar to theones above -and thus are not captured by the parse tree shown in Figure 3SSSSSS(S)S4. An example is the following derivation: SS((S))SS(())SS(())(S)S(())()(())O. Its parse tree is shown inFigure 3-6 (compare with Figure 3-4).0********S-----------SS~~S(SI~e(SS)I)e~(S)IeFigure 3-6Each equivalence class of derivations under similarity, that is to say, eachparse tree, contains a derivation that is maximal under -<; that is, it is notpreceded by any other derivation.
This derivation is called a leftmost derivation. A leftmost derivation exists in every parse tree, and it can be obtained asfollows. Starting from the label of the root A, repeatedly replace the leftmostnonterminal in the current string according to the rule suggested by the parsetree. Similarly, a rightmost derivation is one that does not precede any otherderivation; it is obtained from the parse tree by always expanding the rightmostnonterminal in the current string. Each parse tree has exactly one leftmost andexactly one rightmost derivation. This is so because the leftmost derivation of aparse tree is uniquely determined, since at each step there is one nonterminal toreplace: the leftmost one. Similarly for the rightmost derivation. In the exampleabove, DJ is a leftmost derivation, and DIO is a rightmost one.It is easy to tell when a step of a derivation can be a part of a leftmostderivation: the leftmost nonterminal must be replaced.
We write x ~ y if andChapter 3: CONTEXT-FREE LANGUAGES128only if x = wA,8, y = wo:,8, where wE I;*, 0:,,8 E V', A E V-I;, and A -+ 0:is a rule of G. Thus, if XlX2Xn is a leftmost derivation, then infact XlX2Xn- Similarly for rightmost derivations (the notation isi*i ... i* ... *R*y).To summarize our insights into parse trees and derivations in this section,we state without formal proof the following theorem.XTheorem 3.2.1: Let G = (V,I;,R,S) be a context-free grammar, and let A EV - I;, and w E I;*. Then the following statements are equivalent:(a) A** w.(b) There is a parse tree with root A and yield w.(c) There is a leftmost derivation Ai(d) There is a rightmost derivation A* w.Jt * w.AmbiguityWe saw in Example 3.2.2 that there may be a string in the language generatedby a context-free grammar with two derivations that are not similar -that is tosay, with two distinct parse trees, or, equivalently, with two distinct rightmostderivations (and two distinct leftmost derivations).
For a more substantial example, recall the grammar G that generates all arithmetic expressions over idin Example 3.1.3, and consider another grammar, G', that generates the samelanguage, with these rules:E -+ E+ E,E -+ E* E,E -+ (E),E -+ id.It is not hard to see that L(G') = L(G). Still, there are important differencesbetween G and G'. Intuitively, the variant G', by "blurring the distinction"between factors (F) and terms (T) "risks getting wrong" the precedence ofmultiplication over addition. Indeed, there are two parse trees for the expressionid + id * id in G', both shown in Figure 3-7.
One of them, 3-7(a), correspondsto the "natural" meaning of this expression (with * taking precedence over +),the other is "wrong."Grammars such as G', with strings that have two or more distinct parsetrees, are called ambiguous. As we shall see extensively in Section 3.7, assigninga parse tree to a given string in the language -that is to say, parsing the string-is an important first step towards understanding the structure of the string, thereasons why it belongs to the language --ultimately its "meaning." This is ofcourse of special importance in the case of grammars such as G and G' abovethat generate fragments of programming languages. Ambiguous grammars such1293.3: Pushdown AutomataE~E+ EI~idE * EIIididE~EE*~IE + Eid(a)IIidid(b)Figure 3-7as G' are of no help in parsing, since they assign no unique parse tree -nounique "meaning" - to each string in the language.Fortunately, in this case there is a way to "disambiguate" G' by introducing the new nonterminals T and F (recall the grammar G of Example 3.1.3).That is, there is an unambiguous grammar that generates the same language(namely, the grammar G defined in Example 3.1.3; for a proof that the grammar G is indeed unambiguous see Problem 3.2.1).