1610906232-7ba7dbaea13262b50a6029d682cb7f1b (824370), страница 25
Текст из файла (страница 25)
Finally, we apply the rule A ---+ e. In the resulting string, aaab,we cannot identify any symbol that appears to the left of ---+ in some rule. Thusthe operation of our language generator has ended, and aaab was produced, aspromised.A context-free grammar is a language generator that operates like theone above, with some such set of rules. Let us pause to explain at this point whysuch a language generator is called context-free.
Consider the string aaAb, whichwas an intermediate stage in the generation of aaab. It is natural to call thestrings aa and b that surround the symbol A the context of A in this particularstring. Now, the rule A ---+ aA says that we can replace A by the string aAno matter what the surrounding strings are; in other words, independently ofthe context of A. In Chapter 4 we examine more general grammars, in whichreplacements may be conditioned on the existence of an appropriate context.In a context-free grammar, some symbols appear to the left of ---+ in rules--5, M, A, and B in our example- and some --a and b--- do not. Symbols ofthe latter kind are called terminals, since the production of a string consistingsolely of such symbols signals the termination of the generation process.
Allthese ideas are stated formally in the next definition.Definition 3.1.1: A context-free grammar G is a quadruplewhere(V,~,R, 5),V is an alphabet,~ (the set of terminals) is a subset of V,R (the set of rules) is a finite subset of (V - ~) x V', and5 (the start symbol) is an element of V - LThe members of V - ~ are called nonterminals. For any A E V - ~ andu E V', we write A ---+0 u whenever (A, u) E R.
For any strings u, v E V',we write u =?c v if and only if there are strings x, y E V' and A E V - ~such that u = xAy, v = xv'y, and A ---+0 v'. The relation =?'G is the reflexive,transitive closure of =?c. Finally, L(G), the language generated by G, is{w E ~. : 5 =?'G w}; we also say that G generates each string in L(G).A language L is said to be a context-free language if L = L(G) for somecontext-free grammar G.Chapter 3: CONTEXT-FREE LANGUAGES116When the grammar to which we refer is obvious, we write A ---+ wand u => vinstead of A ---+c wand u =>c v.We call any sequence of the formWo=>cWI=>c ...
=>cWna derivation in G of Wn from W00 Here Wo,···, Wn may be any strings in V*,and n, the length of the derivation, may be any natural number, including zero.We also say that the derivation has n steps.Example 3.1.1: Consider the context-free grammar G = (V,~, R, S), whereV = {S,a,b}, ~ = {a,b}, and R consists of the rules S ---+ aSb and S ---+ e. Apossible derivation isS => aSb => aaSbb => aabb.Here the first two steps used the rule S ---+ aSb, and the last used the ruleS ---+ e. In fact, it is not hard to see that L(G) = {anb n : n 2 O}.
Hence somecontext-free languages are not regular.<)We shall soon see, however, that all regular languages are context-free.Example 3.1.2: Let G be the grammar(W,~,R, S,), whereW = {S,A,N, V,P} U~,~ = {Jim, big, green, cheese, ate},R={P---+N,P ---+ AP,S ---+ PVP,A ---+ big,A ---+ green,N ---+ cheese,N ---+ Jim,V ---+ ate}Here G is designed to be a grammar for a part of English; S stands for sentence,A for adjective, N for noun, V for verb, and P for phrase. The following aresome strings in L(G).Jim ate cheesebig Jim ate green cheesebig cheese ate JimUnfortunately, the following are also strings in L(G):1173.1: Context-Free Grammarsbig cheese ate green green big green big cheesegreen Jim ate green big JilllExample 3.1.3: Computer programs written in any programming languagemust satisfy some rigid criteria in order to be syntactically correct and therefore amenable to mechanical interpretation.
Fortunately, the syntax of mostprogramming languages can, unlike that of human languages, be captured bycontext-free grammars. We shall see in Section 3.7 that being context-free isextremely helpful when it comes to parsing a program, that is, analyzing it tounderstand its syntax. Here, we give a grammar that generates a fragment ofmany common programming languages.
This language consists of all stringsover the alphabet {(,), +, *, id} that represent syntactically correct arithmeticexpressions involving + and *. id stands for any identifier, that is to say, variablename.t Examples of such strings are id and id * (id * id + id), but not *id + ( or+ * id.Let G = (V,:E, R, E) where V, :E, and R are as follows.V = {+,*, (,),id,T,F,E},:E = {+, *, (, ), id},R = {E -+ E +T,E-+ T,T-+T*F,T-+F,F -+ (E),F -+ id}.(Rl)(R2)(R3)(R4)(R5)(R6)The symbols E, T, and F are abbreviations for expression, term, and factor,respectively.The grammar G generates the string (id * id + id) * (id + id) by the followingderi vation.E~Ttby Rule R2~T*Fby Rule R3~T*(E)by Rule R5~T*(E +T)by Rule RlIncidentally, discovering such identifiers (or reserved words of the language, ornumerical constants) in the program is accomplished at the earlier stage of lexicalanalysis, by algorithms based on regular expressions and finite automata.118Chapter 3: CONTEXT-FREE LANGUAGESby Rule R2*T * (T + T)*T* (F + T)*T* (id + T)*T* (id + F)*T * (id + id)*F * (id + id)*(E) * (id + id)*(E + T) * (id + id)by Rule R4by Rule R6by Rule R4by Rule R6by Rule R4by Rule R5by Rule Rlby Rule R4*(E+F)*(id+id)by Rule R6*(E + id) * (id + id)*(T + id) * (id + id)*(T * F + id) * (id + id)*(F * F + id) * (id + id)*(F * id + id) * (id + id)by Rule R3*(id * id + id) * (id + id)by Rule R6by Rule R2by Rule R4by Rule R6See Problem 3.1.8 for context-free grammars that generate larger subsets ofprogramming languages.OExample 3.1.4: The following grammar generates all strings of properly balanced left and right parentheses: every left parenthesis can be paired with aunique subsequent right parenthesis, and every right parenthesis can be pairedwith a unique preceding left parenthesis.
Moreover, the string between any suchpair has the same property. We let G = (V, 2;, R, S), whereV = {S,(,)},{C)},R = {S -+ e,S -+ SS,S -+ (S)}.2; =Two derivations in this grammar areS * SS * S(S) * S((S)) * S(()) *OW)andS * SS * (S)S * OS * O(S) * O(())1193.1: Context-Free GrammarsThus the same string may have several derivations in a context-free grammar;in the next subsection we discuss the intricate ways in which such derivationsmay be related.Incidentally, L(G) is another context-free language that is not regular (thatit is not regular was the object of Problem 2.4.6).0EXaIllple 3.1.5: Obviously, there are context-free languages that are not regular(we have already seen two examples). However, all regular languages are contextfree. In the course of this chapter we shall encounter several proofs of this fact.For example, we shall see in Section 3.3 that context-free languages are preciselythe languages accepted by certain language acceptors called pushdown automata.Now we shall also point out that the pushdown acceptor is a generalization ofthe finite automaton, in the sense that any finite automaton can be triviallyconsidered as a pushdown automatoIl.
Hence all regular languages are contextfree.For another proof, we shall see in Section 3.5 that the class of contextfree languages is closed under union, concatenation, and Kleene star (Theorem3.5.1); furthermore, the trivial languages 0 and {a} are definitely context-free(generated by the context-free grammars with no rules, or with only the ruleS -+ a, respectively). Hence the class of context-free languages must contain allregular languages, the closure of the trivial languages under these operations.SFigure 3-1But let us now show that all regular languages are context-free by a directconstruction. Consider the regular language accepted by the deterministic finiteautomaton M = (K, 2;, <5, s, F).
The same language is generated by the grammarG(M) = (V, 2;, R, S), where V = K U 2;, S = s, and R consists of these rules:R= {q -+ ap : <5 (q, a) = p} U {q -+ e : q E F}.That is, the nonterminals are the states of the automaton; as for rules, for eachtransition from q to p on input a we have in R the rule q -+ ap. For example,for the automaton in Figure 3-1 we would construct this grammar:S -+ as,S -+ bA,A -+ aE,A -+ bA,E -+ as,E -+ bA,E -+ e.120Chapter 3: CONTEXT-FREE LANGUAGESIt is left as an exercise to show that the resulting context-free grammar generates precisely the language accepted by the automaton (see Problem 3.1.10 fora general treatment of context-free grammars such as G(M) above, and theirrelationship with finite automata).OProblems for Section 3.13.1.1.
Consider the grammar G = (V, I:, R, S), whereV = {a,b,S,A},I: = {a, b},R = {S --+ AA,A --+ AAA,.A --+ a,A --+ bA,A --+ Ab}.(a) Which strings of L(G) can be produced by derivations of four or fewersteps?(1)) Give at least four distinct derivations for the string babbab,(c) For any 111, n,p > 0, describe a derivation in G of the string blnab"abP.3.1.2. Consider the grammar (V, I:, R, S), where V, I:, and R are defined as follows:V = {a,b,S,A},I: = {a, b},R = {S --+ aAa,S --+ bAb,S --+ e,A --+ SS}.Give a derivation of the string baabbb in G. (Notice that, unlike all othercontext-free languages we have seen so far, this one is very difficult to describe in English.)3.1.3.
Construct context-free grammars that generate each of these languages.(a) {wcw R : w E {a,b}*}(1)) {ww R : w E {a, b}'}(c) {w E {a,b}': w = w R }1213.1: Context-Free Grammars3.1.4. Consider the alphabet 2; = {a,b,(,),U,*,0}. Construct a context-freegrammar that generates all strings in 2;* that are regular expressions over{a, b}.3.1.5. Consider the context-free grammar G = (V, 2;, R, S), whereV = {a,b,S,A,B},{a, b},R = {S -+ aB,S -+ bA,A -+ a,A -+ as,A -+ BAA,B -+ b,B -+ bS,B -+ ABB}.2; =(a) Show that ababba E L(G).(b) Prove that L( G) is the set of all nonempty strings in {a, b} that have equalnumbers of occurrences of a and b.3.1.6. Let G be a context-free grammar and let k > o.
We let Lk(G) <:;;; L(G) bethe set of all strings that have a derivation in G with k or fewer steps.(a) What is L 5 (G), where G is the grammar of Example 3.1.4(b) Show that, for all context-free grammars G and all k > 0, L d G) isfinite.3.1.7. Let G = (V,2;,R,S), where V = {a,b,S}, 2; = {a,b}, and R = {S-+aSb, S -+ aSa, S -+ bSa, S -+ bSb, S -+ e}. Show that L(G) is regular.3.1.8. A program in a real programming language, such as C or Pascal, consistsof statements, where each statement is one of several types:(1) assignment statement, of the form id := E, where E is any arithmeticexpression (generated by the grammar of Example 3.1.3).(2) conditional statement, of the form, say, if E < E then statement, or awhile statement of the form while E < E do statement.(3) goto statement; furthermore, each statement could be preceded by alabel.(4) compound statement, that is, many statements preceded by a begin,followed by an end, and separated by a";".Give a context-free grammar that generates all possible statements in thesimplified programming language described above.122Chapter 3: CONTEXT-FREE LANGUAGES3.1.9.