K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (798440), страница 41
Текст из файла (страница 41)
If thisspace reduction adds an extra indirection to every table access, the cost ofthose memory operations must trade off directly against the savings in memory. The table generator could also use other techniques to represent sparsematrices—again, the implementor must consider the tradeoff of memory sizeagainst any increase in access costs.3.6 Advanced Topics 151Action TableState012345678910111213141516171819202122232425262728293031eof+−accr4r7s7r4r7s8r4r7s9r7s 10r7r9r 10r9r 10r9r 10r9r 10r9r 10×÷s 21r4r7s 22r4r7s 24r7s 25r7r9r 10r2r3r5r6r9r 10r2r3r5r6r9r 10s9s9r5r6r9r 10s 10s 10r5r6()numnames4s5s6s 14s 15s 16s4s4s4s4s5s5s5s5s6s6s6s6s 15s 16s 14s 14s 15s 15s 16s 16s 14s 14s 15s 15s 16s 16s 23r4r7s 14r2r3r5r6r8r8s 21r2r3r5r6r8r8s 22r2r3r5r6r8r8s 24s 24r5r6r8r9r 10r8s 25s 25r5r6r8s 31r2r3r5r6r8n FIGURE 3.31 Action Table for the Classic Expression Grammar.Shrinking the GrammarIn many cases, the compiler writer can recode the grammar to reduce thenumber of productions it contains.
This usually leads to smaller tables. Forexample, in the classic expression grammar, the distinction between a number and an identifier is irrelevant to the productions for Goal, Expr, Term,and Factor. Replacing the two productions Factor → num and Factor →152 CHAPTER 3 ParsersGoto TableGoto TableStateExprTermFactorState0123456789101112131415123111213161718192021222324252627282930312617183319201213ExprTermFactor272813132930n FIGURE 3.32 Goto Table for the Classic Expression Grammar.name with a single production Factor → val shrinks the grammar by a production. In the Action table, each terminal symbol has its own column.Folding num and name into a single symbol, val, removes a column fromthe Action table.
To make this work, in practice, the scanner must return thesame syntactic category, or word, for both num and name.Similar arguments can be made for combining x and ÷ into a single terminal muldiv, and for combining + and - into a single terminal addsub.Each of these replacements removes a terminal symbol and a production.These three changes produce the reduced expression grammar shown inFigure 3.33a. This grammar produces a smaller CC, removing rows from thetable.
Because it has fewer terminal symbols, it has fewer columns as well.The resulting Action and Goto tables are shown in Figure 3.33b. TheAction table contains 132 entries and the Goto table contains 66 entries,for a total of 198 entries. This compares favorably with the tables for theoriginal grammar, with their 384 entries. Changing the grammar produced a48 percent reduction in table size. The tables still contain opportunities forfurther reductions. For example, rows 0, 6, and 7 in the Action table areidentical, as are rows 4, 11, 15, and 17.
Similarly, the Goto table has many3.6 Advanced Topics 1531234567GoalExpr→ Expr→ Expr addsub TermTerm→ Term muldiv Factor| Term| FactorFactor → ( Expr )| val(a) The Reduced Expression GrammarAction Table0123456789101112131415161718192021eofaddsubmuldivaccr3r5s6r3r5s7r5r7r7r7s 15r3r5s 17r5r2r4r7r2r4r7s7r4r6r6r6Goto Table()valExprTermFactors4s5123s 11s 128910s4s4s5s5133149101910s 16r3r5s 11s 12s 11s 12s 11s 15r2r4r6s 17r4r618r7s 1220s 21r2r4r6(b) Action and Goto Tables for the Reduced Expression Grammarn FIGURE 3.33 The Reduced Expression Grammar and its Tables.rows that only contain the error entry.
If table size is a serious concern, rowsand columns can be combined after shrinking the grammar.Other considerations may limit the compiler writer’s ability to combine productions. For example, the x operator might have multiple uses that makecombining it with ÷ impractical. Similarly, the parser might use separate154 CHAPTER 3 Parsersproductions to let the parser handle two syntactically similar constructs indifferent ways.Directly Encoding the TableAs a final improvement, the parser generator can abandon the tabledriven skeleton parser in favor of a hard-coded implementation.
Each statebecomes a small case statement or a collection of if–then–else statementsthat test the type of the next symbol and either shift, reduce, accept, orreport an error. The entire contents of the Action and Goto tables can beencoded in this way. (A similar transformation for scanners is discussed inSection 2.5.2.)The resulting parser avoids directly representing all of the “don’t care” statesin the Action and Goto tables, shown as blanks in the figures.
This spacesavings may be offset by larger code size, since each state now includesmore code. The new parser, however, has no parse table, performs no tablelookups, and lacks the outer loop found in the skeleton parser. While itsstructure makes it almost unreadable by humans, it should execute morequickly than the corresponding table-driven parser.
With appropriate codelayout techniques, the resulting parser can exhibit strong locality in both theinstruction cache and the paging system. For example, we should place allthe routines for the expression grammar together on a single page, wherethey cannot conflict with one another.Using Other Construction AlgorithmsSeveral other algorithms to construct lr-style parsers exist. Among thesetechniques are the slr(1) construction, for simple lr(1), and the lalr(1)construction, for lookahead lr(1). Both of these constructions producesmaller tables than the canonical lr(1) algorithm.The slr(1) algorithm accepts a smaller class of grammars than the canonical lr(1) construction. These grammars are restricted so that the lookaheadsymbols in the lr(1) items are not needed. The algorithm uses follow setsto distinguish between cases in which the parser should shift and those inwhich it should reduce.
This mechanism is powerful enough to resolve manygrammars of practical interest. By using follow sets, the algorithm eliminates the need for lookahead symbols. This produces a smaller canonicalcollection and a table with fewer rows.The lalr(1) algorithm capitalizes on the observation that some items in theset representing a state are critical and that the remaining ones can be derivedfrom the critical items. The lalr(1) table construction only represents the3.7 Summary and Perspective 155critical items; again, this produces a canonical collection that is equivalentto the one produced by the slr(1) construction.
The details differ, but thetable sizes are the same.The canonical lr(1) construction presented earlier in the chapter is the mostgeneral of these table-construction algorithms. It produces the largest tables,but accepts the largest class of grammars. With appropriate table reductiontechniques, the lr(1) tables can approximate the size of those produced bythe more limited techniques.
However, in a mildly counterintuitive result,any language that has an lr(1) grammar also has an lalr(1) grammar andan slr(1) grammar. The grammars for these more restrictive forms willbe shaped in a way that allows their respective construction algorithms toresolve the situations in which the parser should shift and those in which itshould reduce.3.7 SUMMARY AND PERSPECTIVEAlmost every compiler contains a parser. For many years, parsing was asubject of intense interest.
This led to the development of many differenttechniques for building efficient parsers. The lr(1) family of grammarsincludes all of the context-free grammars that can be parsed in a deterministic fashion. The tools produce efficient parsers with provably strongerror-detection properties. This combination of features, coupled with thewidespread availability of parser generators for lr(1), lalr(1), and slr(1)grammars, has decreased interest in other automatic parsing techniques suchas operator precedence parsers.Top-down, recursive-descent parsers have their own set of advantages.
Theyare, arguably, the easiest hand-coded parsers to construct. They provideexcellent opportunities to detect and repair syntax errors. They are efficient;in fact, a well-constructed top-down, recursive-descent parser can be fasterthan a table-driven lr(1) parser. (The direct encoding scheme for lr(1) mayovercome this speed advantage.) In a top-down, recursive-descent parser, thecompiler writer can more easily finesse ambiguities in the source languagethat might trouble an lr(1) parser—such as a language in which keywordnames can appear as identifiers. A compiler writer who wants to construct ahand-coded parser, for whatever reason, is well advised to use the top-down,recursive-descent method.In choosing between lr(1) and ll(1) grammars, the choice becomes one ofavailable tools.
In practice, few, if any, programming-language constructsfall in the gap between lr(1) grammars and ll(1) grammars. Thus, starting with an available parser generator is always better than implementing aparser generator from scratch.156 CHAPTER 3 ParsersMore general parsing algorithms are available. In practice, however, therestrictions placed on context-free grammars by the lr(1) and ll(1) classesdo not cause problems for most programming languages.nCHAPTER NOTESThe earliest compilers used hand-coded parsers [27, 227, 314]. The syntactic richness of Algol 60 challenged early compiler writers.
They tried avariety of schemes to parse the language; Randell and Russell give a fascinating overview of the methods used in a variety of Algol 60 compilers [293,Chapter 1].Irons was one of the first to separate the notion of syntax from translation [202].