Cooper_Engineering_a_Compiler(Second Edition) (Rice), страница 41

Rice 1869

Описание файла

Файл "Cooper_Engineering_a_Compiler(Second Edition)" внутри архива находится в следующих папках: Rice, Купер и Торчсон - перевод. PDF-файл из архива "Rice", который расположен в категории "разное". Всё это находится в предмете "конструирование компиляторов" из седьмого семестра, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .

Просмотр PDF-файла онлайн

Текст 41 страницы из PDF

To make this work, in practice, the scanner must return thesame syntactic category, or word, for both num and name.Similar arguments can be made for combining x and ÷ into a single terminal muldiv, and for combining + and - into a single terminal addsub.Each of these replacements removes a terminal symbol and a production.These three changes produce the reduced expression grammar shown inFigure 3.33a. This grammar produces a smaller CC, removing rows from thetable. Because it has fewer terminal symbols, it has fewer columns as well.The resulting Action and Goto tables are shown in Figure 3.33b.

TheAction table contains 132 entries and the Goto table contains 66 entries,for a total of 198 entries. This compares favorably with the tables for theoriginal grammar, with their 384 entries. Changing the grammar produced a48 percent reduction in table size. The tables still contain opportunities forfurther reductions. For example, rows 0, 6, and 7 in the Action table areidentical, as are rows 4, 11, 15, and 17. Similarly, the Goto table has many3.6 Advanced Topics 1531234567GoalExpr→ Expr→ Expr addsub TermTerm→ Term muldiv Factor| Term| FactorFactor → ( Expr )| val(a) The Reduced Expression GrammarAction Table0123456789101112131415161718192021eofaddsubmuldivaccr3r5s6r3r5s7r5r7r7r7s 15r3r5s 17r5r2r4r7r2r4r7s7r4r6r6r6Goto Table()valExprTermFactors4s5123s 11s 128910s4s4s5s5133149101910s 16r3r5s 11s 12s 11s 12s 11s 15r2r4r6s 17r4r618r7s 1220s 21r2r4r6(b) Action and Goto Tables for the Reduced Expression Grammarn FIGURE 3.33 The Reduced Expression Grammar and its Tables.rows that only contain the error entry.

If table size is a serious concern, rowsand columns can be combined after shrinking the grammar.Other considerations may limit the compiler writer’s ability to combine productions. For example, the x operator might have multiple uses that makecombining it with ÷ impractical.

Similarly, the parser might use separate154 CHAPTER 3 Parsersproductions to let the parser handle two syntactically similar constructs indifferent ways.Directly Encoding the TableAs a final improvement, the parser generator can abandon the tabledriven skeleton parser in favor of a hard-coded implementation. Each statebecomes a small case statement or a collection of if–then–else statementsthat test the type of the next symbol and either shift, reduce, accept, orreport an error.

The entire contents of the Action and Goto tables can beencoded in this way. (A similar transformation for scanners is discussed inSection 2.5.2.)The resulting parser avoids directly representing all of the “don’t care” statesin the Action and Goto tables, shown as blanks in the figures. This spacesavings may be offset by larger code size, since each state now includesmore code. The new parser, however, has no parse table, performs no tablelookups, and lacks the outer loop found in the skeleton parser. While itsstructure makes it almost unreadable by humans, it should execute morequickly than the corresponding table-driven parser. With appropriate codelayout techniques, the resulting parser can exhibit strong locality in both theinstruction cache and the paging system.

For example, we should place allthe routines for the expression grammar together on a single page, wherethey cannot conflict with one another.Using Other Construction AlgorithmsSeveral other algorithms to construct lr-style parsers exist. Among thesetechniques are the slr(1) construction, for simple lr(1), and the lalr(1)construction, for lookahead lr(1). Both of these constructions producesmaller tables than the canonical lr(1) algorithm.The slr(1) algorithm accepts a smaller class of grammars than the canonical lr(1) construction. These grammars are restricted so that the lookaheadsymbols in the lr(1) items are not needed. The algorithm uses follow setsto distinguish between cases in which the parser should shift and those inwhich it should reduce.

This mechanism is powerful enough to resolve manygrammars of practical interest. By using follow sets, the algorithm eliminates the need for lookahead symbols. This produces a smaller canonicalcollection and a table with fewer rows.The lalr(1) algorithm capitalizes on the observation that some items in theset representing a state are critical and that the remaining ones can be derivedfrom the critical items. The lalr(1) table construction only represents the3.7 Summary and Perspective 155critical items; again, this produces a canonical collection that is equivalentto the one produced by the slr(1) construction. The details differ, but thetable sizes are the same.The canonical lr(1) construction presented earlier in the chapter is the mostgeneral of these table-construction algorithms. It produces the largest tables,but accepts the largest class of grammars.

With appropriate table reductiontechniques, the lr(1) tables can approximate the size of those produced bythe more limited techniques. However, in a mildly counterintuitive result,any language that has an lr(1) grammar also has an lalr(1) grammar andan slr(1) grammar. The grammars for these more restrictive forms willbe shaped in a way that allows their respective construction algorithms toresolve the situations in which the parser should shift and those in which itshould reduce.3.7 SUMMARY AND PERSPECTIVEAlmost every compiler contains a parser. For many years, parsing was asubject of intense interest.

This led to the development of many differenttechniques for building efficient parsers. The lr(1) family of grammarsincludes all of the context-free grammars that can be parsed in a deterministic fashion. The tools produce efficient parsers with provably strongerror-detection properties. This combination of features, coupled with thewidespread availability of parser generators for lr(1), lalr(1), and slr(1)grammars, has decreased interest in other automatic parsing techniques suchas operator precedence parsers.Top-down, recursive-descent parsers have their own set of advantages.

Theyare, arguably, the easiest hand-coded parsers to construct. They provideexcellent opportunities to detect and repair syntax errors. They are efficient;in fact, a well-constructed top-down, recursive-descent parser can be fasterthan a table-driven lr(1) parser. (The direct encoding scheme for lr(1) mayovercome this speed advantage.) In a top-down, recursive-descent parser, thecompiler writer can more easily finesse ambiguities in the source languagethat might trouble an lr(1) parser—such as a language in which keywordnames can appear as identifiers.

A compiler writer who wants to construct ahand-coded parser, for whatever reason, is well advised to use the top-down,recursive-descent method.In choosing between lr(1) and ll(1) grammars, the choice becomes one ofavailable tools. In practice, few, if any, programming-language constructsfall in the gap between lr(1) grammars and ll(1) grammars. Thus, starting with an available parser generator is always better than implementing aparser generator from scratch.156 CHAPTER 3 ParsersMore general parsing algorithms are available. In practice, however, therestrictions placed on context-free grammars by the lr(1) and ll(1) classesdo not cause problems for most programming languages.nCHAPTER NOTESThe earliest compilers used hand-coded parsers [27, 227, 314]. The syntactic richness of Algol 60 challenged early compiler writers.

They tried avariety of schemes to parse the language; Randell and Russell give a fascinating overview of the methods used in a variety of Algol 60 compilers [293,Chapter 1].Irons was one of the first to separate the notion of syntax from translation [202]. Lucas appears to have introduced the notion of recursive-descentparsing [255]. Conway applies similar ideas to an efficient single-passcompiler for cobol [96].The ideas behind ll and lr parsing appeared in the 1960s. Lewis and Stearnsintroduced ll(k) grammars [245]; Rosenkrantz and Stearns described theirproperties in more depth [305].

Foster developed an algorithm to transform agrammar into ll(1) form [151]. Wood formalized the notion of left-factoringa grammar and explored the theoretical issues involved in transforming agrammar to ll(1) form [353, 354, 355].Knuth laid out the theory behind lr(1) parsing [228]. DeRemer and others developed techniques, the slr and lalr table-construction algorithms,that made the use of lr parser generators practical on the limited-memorycomputers of the day [121, 122]. Waite and Goos describe a techniquefor automatically eliminating useless productions during the lr(1) tableconstruction algorithm [339].

Penello suggested direct encoding of the tablesinto executable code [282]. Aho and Ullman [8] is a definitive referenceon both ll and lr parsing. Bill Waite provided the example grammar inexercise 3.7.Several algorithms for parsing arbitrary context-free grammars appearedin the 1960s and early 1970s. Algorithms by Cocke and Schwartz [91],Younger [358], Kasami [212], and Earley [135] all had similar computational complexity.

Earley’s algorithm deserves particular note because ofits similarity to the lr(1) table-construction algorithm. Earley’s algorithmderives the set of possible parse states at parse time, rather than at runtime,where the lr(1) techniques precompute these in a parser generator. From ahigh-level view, the lr(1) algorithms might appear as a natural optimizationof Earley’s algorithm.Exercises 157nEXERCISES1. Write a context-free grammar for the syntax of regular expressions.Section 3.22.

Write a context-free grammar for the Backus-Naur form (bnf)notation for context-free grammars.3. When asked about the definition of an unambiguous context-freegrammar on an exam, two students gave different answers. The firstdefined it as “a grammar where each sentence has a unique syntax treeby leftmost derivation.” The second defined it as “a grammar whereeach sentence has a unique syntax tree by any derivation.” Which oneis correct?4.

The following grammar is not suitable for a top-down predictiveparser. Identify the problem and correct it by rewriting the grammar.Show that your new grammar satisfies the ll(1) condition.L →|RaQ baR →||abacabaQ →|bbcbcR bc5. Consider the following grammar:A → BaB → dab| CbC → cB| AcDoes this grammar satisfy the ll(1) condition? Justify your answer.

Ifit does not, rewrite it as an ll(1) grammar for the same language.6. Grammars that can be parsed top-down, in a linear scan from left toright, with a k word lookahead are called ll(k) grammars. In the text,the ll(1) condition is described in terms of first sets. How wouldyou define the first sets necessary to describe an ll(k) condition?7. Suppose an elevator is controlled by two commands: ↑ to move theelevator up one floor and ↓ to move the elevator down one floor.Assume that the building is arbitrarily tall and that the elevator startsat floor x.Write an ll(1) grammar that generates arbitrary command sequencesthat (1) never cause the elevator to go below floor x and (2) alwaysreturn the elevator to floor x at the end of the sequence.

For example,↑↑↓↓ and ↑↓↑↓ are valid command sequences, but ↑↓↓↑ and ↑↓↓are not. For convenience, you may consider a null sequence as valid.Prove that your grammar is ll(1).Section 3.3158 CHAPTER 3 ParsersSection 3.48. Top-down and bottom-up parsers build syntax trees in differentorders. Write a pair of programs, TopDown and BottomUp, that take asyntax tree and print out the nodes in order of construction.

Свежие статьи
Популярно сейчас