Главная » Все файлы » Просмотр файлов из архивов » PDF-файлы » A.W. Appel, J. Palsberg - Modern Compiler Implementation in Java (Second Edition)

A.W. Appel, J. Palsberg - Modern Compiler Implementation in Java (Second Edition), страница 14

Описание файла

PDF-файл из архива "A.W. Appel, J. Palsberg - Modern Compiler Implementation in Java (Second Edition)", который расположен в категории "книги и методические указания". Всё это находится в предмете "конструирование компиляторов" из седьмого семестра, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .

Просмотр PDF-файла онлайн

Текст 14 страницы из PDF

Parser construction for SLR is almost identical to that for LR(0), except that weput reduce actions into the table only where indicated by the FOLLOW set.61Here is the algorithm for putting reduce actions into an SLR table:R ← {}for each state I in Tfor each item A → α. in Ifor each token X in FOLLOW(A)R ← R ∪ {(I, X, A → α)}The action (I, X, A → α) indicates that in state I, on lookahead symbol X, the parser willreduce by rule A → α.Thus, for Grammar 3.23 we use the same LR(0) state diagram (Figure 3.24), but we put fewerreduce actions into the SLR table, as shown in Figure 3.25.Figure 3.25: SLR parsing table for Grammar 3.23.The SLR class of grammars is precisely those grammars whose SLR parsing table contains noconflicts (duplicate entries). Grammar 3.23 belongs to this class, as do many usefulprogramming-language grammars.LR(1) ITEMS; LR(1) PARSING TABLEEven more powerful than SLR is the LR(1) parsing algorithm.

Most programming languageswhose syntax is describable by a context-free grammar have an LR(1) grammar.The algorithm for constructing an LR(1) parsing table is similar to that for LR(0), but thenotion of an item is more sophisticated. An LR(1) item consists of a grammar production, aright-hand-side position (represented by the dot), and a lookahead symbol.

The idea is that anitem (A → α.β, x) indicates that the sequence α is on top of the stack, and at the head of theinput is a string derivable from βx.An LR(1) state is a set of LR(1) items, and there are Closure and Goto operations for LR(1)that incorporate the lookahead:Closure(I) =Goto(I, X) =repeatfor any item (A → α.Xβ, z) in Ifor any production X → γfor any w ∈ FIRST(βz)J ← {}for any item (A → α.Xβ, z) in Iadd (A → αX.β, z) to Jreturn Closure(J).I ← I ∪ {(X → .γ, w)}until I does not change62return IThe start state is the closure of the item (S′ → .S $, ?), where the lookahead symbol ? will notmatter, because the end-of-file marker will never be shifted.The reduce actions are chosen by this algorithm:R ← {}for each state I in Tfor each item (A → α., z) in IR ← R ∪{(I, z, A → α)}The action (I, z, A → α) indicates that in state I, on lookahead symbol z, the parser will reduceby rule A → α.Grammar 3.26 is not SLR (see Exercise 3.9), but it is in the class of LR(1) grammars.

Figure3.27 shows the LR(1) states for this grammar; in the figure, where there are several items withthe same production but different lookahead, as at left below, we have abbreviated as at right:Figure 3.27: LR(1) states for Grammar 3.26.GRAMMAR 3.26: A grammar capturing the essence of expressions, variables, and pointerdereference (by the *) operator in the C language.0. S′ → S $1. S → V = E2. S → E3. E → V635.

V → * E4. V → xThe LR(1) parsing table derived from this state graph is Table 3.28a. Wherever the dot is atthe end of a production (as in state 3 of Figure 3.27, where it is at the end of production E →V ), then there is a reduce action for that production in the LR(1) table, in the rowcorresponding to the state number and the column corresponding to the lookahead of the item(in this case, the lookahead is $). Whenever the dot is to the left of a terminal symbol ornonterminal, there is a corresponding shift or goto action in the LR(1) parsing table, just asthere would be in an LR(0) table.Table 3.28: LR(1) and LALR(1) parsing tables for Grammar 3.26.LALR(1) PARSING TABLESLR(1) parsing tables can be very large, with many states.

A smaller table can be made bymerging any two states whose items are identical except for lookahead sets. The result parseris called an LALR(1) parser, for lookahead LR(1).For example, the items in states 6 and 13 of the LR(1) parser for Grammar 3.26 (Figure 3.27)are identical if the lookahead sets are ignored. Also, states 7 and 12 are identical except forlookahead, as are states 8 and 11 and states 10 and 14.

Merging these pairs of states gives theLALR(1) parsing table shown in Table 3.28b.For some grammars, the LALR(1) table contains reduce-reduce conflicts where the LR(1)table has none, but in practice the difference matters little. What does matter is that theLALR(1) parsing table requires less memory to represent than the LR(1) table, since there canbe many fewer states.64HIERARCHY OF GRAMMAR CLASSESA grammar is said to be LALR(1) if its LALR(1) parsing table contains no conflicts. All SLRgrammars are LALR(1), but not vice versa.

Figure 3.29 shows the relationship betweenseveral classes of grammars.Figure 3.29: A hierarchy of grammar classes.Any reasonable programming language has a LALR(1) grammar, and there are many parsergenerator tools available for LALR(1) grammars. For this reason, LALR(1) has become astandard for programming languages and for automatic parser generators.LR PARSING OF AMBIGUOUS GRAMMARSMany programming languages have grammar rules such as•••S → if E then S else SS → if E then SS → otherwhich allow programs such asif a then if b then s1 else s2Such a program could be understood in two ways:(1)(2)if a then { if b then s1 else s2 }if a then { if b then s1 } else s2In most programming languages, an else must match the most recent possible then, sointerpretation (1) is correct.

In the LR parsing table there will be a shift-reduce conflict:65Shifting corresponds to interpretation (1) and reducing to interpretation (2).The ambiguity can be eliminated by introducing auxiliary nonterminals M (for matchedstatement)and U (for unmatched statement):••S→MS→U•M → if E then M else M•M → other••U → if E then SU → if E then M else UBut instead of rewriting the grammar, we can leave the grammar unchanged and tolerate theshift-reduce conflict.

In constructing the parsing table this conflict should be resolved byshifting, since we prefer interpretation (1).It is often possible to use ambiguous grammars by resolving shift-reduce conflicts in favor ofshifting or reducing, as appropriate. But it is best to use this technique sparingly, and only incases (such as the dangling-else described here, and operator-precedence to be described onpage 74) that are well understood. Most shift-reduce conflicts, and probably all reduce-reduceconflicts, should not be resolved by fiddling with the parsing table. They are symptoms of anill-specified grammar, and they should be resolved by eliminating ambiguities.3.4 USING PARSER GENERATORSThe task of constructing a parser is simple enough to be automated.

In the previous chapterwe described the lexical-analyzer aspects of JavaCC and SableCC. Here we will discuss theparser-generator aspects of these tools. Documentation for JavaCC and SableCC are availablevia this book's Web site.JAVACCJavaCC is an LL(k) parser generator. Productions are of the form:void Assignment() : {} { Identifier() "=" Expression() ";" }where the left-hand side is Assignment(); the right-hand side is enclosed between the lasttwo curly brackets; Assignment(), Identifier(), and Expression() are nonterminalsymbols; and "=" and ";" are terminal symbols.Grammar 3.30 can be represented as a JavaCC grammar as shown in Grammar 3.31.

Noticethat if we had written the production for StmList() in the style of Grammar 3.30, that is,void StmList() :{}{ Stm()| StmList( ) ";" Stm()}66GRAMMAR 3.301. P → L5. S → if id then S3. S → while id do S7. L → S2. S → id := id4. S → begin L end6. S → if id then S else S8. L → L ; SGRAMMAR 3.31: JavaCC version of Grammar 3.30.PARSER_BEGIN(MyParser)public class MyParser {}PARSER_END(MyParser)SKIP :{ " " | "\t" | "\n" }TOKEN :{ < WHILE: "while" >| < BEGIN: "begin" >| < END: "end" >| < DO: "do" >| < IF: "if" >| < THEN: "then" >| < ELSE: "else" >| < SEMI: ";" >| < ASSIGN: "=" >| < ID: ["a"-"z"](["a"-"z"] | ["0"-"9"])* >}void Prog() :{}{ StmList() <EOF> }void StmList() :{}{ Stm() StmListPrime() }void StmListPrime() :{}{ ( ";" Stm() StmListPrime() )? }void Stm() :{}{ <ID> "=" <ID>| "while" <ID> "do" Stm()| "begin" StmList() "end"| LOOKAHEAD(5) /* we need to lookahead till we see "else" */"if" <ID> "then" Stm()| "if" <ID> "then" Stm() "else" Stm()}then the grammar would be left recursive.

In that case, JavaCC would give the followingerror:Left recursion detected: "StmList... --> StmList..."67We used the techniques mentioned earlier to remove the left recursion and arrive at Grammar3.31.SABLECCSableCC is an LALR(1) parser generator. Productions are of the form:assignment = identifier assign expression semicolon ;where the left-hand side is assignment; the right-hand side is enclosed between = and ;;assignment, identifier, and expression are nonterminal symbols; and assign andsemicolon are terminal symbols that are defined in an earlier part of the syntax specification.Grammar 3.30 can be represented as a SableCC grammar as shown in Grammar 3.32. Whenthere is more than one alternative, SableCC requires a name for each alternative.

A name isgiven to an alternative in the grammar by prefixing the alternative with an identifier betweencurly brackets. Also, if the same grammar symbol appears twice in the same alternative of aproduction, SableCC requires a name for at least one of the two elements. Element names arespecified by prefixing the element with an identifier between square brackets followed by acolon.GRAMMAR 3.32: SableCC version of Grammar 3.30.Tokenswhile = 'while';begin = 'begin';end = 'end';do = 'do';if = 'if';then = 'then';else = 'else';semi = ';';assign = '=';whitespace = (' ' | '\t' | '\n')+;id = ['a'..'z'](['a'..'z'] | ['0'..'9'])*;Ignored Tokenswhitespace;Productionsprog = stmlist;stm ={assign} [left]:id assign [right]:id |{while} while id do stm |{begin} begin stmlist end |{if_then} if id then stm |{if_then_else} if id then [true_stm]:stm else [false_stm]:stm;stmlist = {stmt} stm |{stmtlist} stmlist semi stm;SableCC reports shift-reduce and reduce-reduce conflicts.

A shift-reduce conflict is a choicebetween shifting and reducing; a reduce-reduce conflict is a choice of reducing by twodifferent rules.SableCC will report that the Grammar 3.32 has a shift-reduce conflict. The conflict can beexamined by reading the detailed error message SableCC produces, as shown in Figure 3.33.68shift/reduce conflict in state [stack: TIf TId TThen PStm *] on TElse in {[ PStm = TIf TId TThen PStm * TElse PStm ] (shift),[ PStm = TIf TId TThen PStm * ] followed by TElse (reduce)}Figure 3.33: SableCC shift-reduce error message for Grammar 3.32.SableCC prefixes productions with an uppercase ‘P' and tokens with an uppercase ‘T', andreplaces the first letter with an uppercase when it makes the objects for the tokens andproductions.

Свежие статьи
Популярно сейчас