Cooper_Engineering_a_Compiler(Second Edition) (Rice), страница 40
Файл "Cooper_Engineering_a_Compiler(Second Edition)" внутри архива находится в следующих папках: Rice, Купер и Торчсон - перевод. PDF-файл из архива "Rice", который расположен в категории "разное". Всё это находится в предмете "конструирование компиляторов" из седьмого семестра, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст 40 страницы из PDF
This produces theast shown on the left. Similarly, the right-recursive grammar produces theast shown on the right.For a list, neither of these orders is obviously incorrect, although the rightrecursive ast may seem more natural. Consider, however, the result if wereplace the list constructor with arithmetic operations, as in the grammarsExpr → Expr + Operand| Expr - Operand| OperandExpr → Operand + Expr| Operand - Expr| OperandFor the string x1 + x2 + x3 + x4 + x5 the left-recursive grammar implies a leftto-right evaluation order, while the right-recursive grammar implies a rightto-left evaluation order.
With some number systems, such as floating-pointarithmetic, these two evaluation orders can produce different results.Since the mantissa of a floating-point number is small relative to the range ofthe exponent, addition can become an identity operation with two numbersthat are far apart in magnitude. If, for example, x4 is much smaller than x5 ,the processor may compute x4 + x5 = x5 With well-chosen values, this effectcan cascade and yield different answers from left-to-right and right-to-leftevaluations.Similarly, if any of the terms in the expression is a function call, then theorder of evaluation may be important. If the function call changes the value3.6 Advanced Topics 147of a variable in the expression, then changing the evaluation order mightchange the result.In a string with subtractions, such as x1 -x2 +x3 , changing the evaluationorder can produce incorrect results.
Left associativity evaluates, in a postorder tree walk, to (x1 - x2 ) + x3 , the expected result. Right associativity,on the other hand, implies an evaluation order of x1 - (x2 + x3 ). The compiler must, of course, preserve the evaluation order dictated by the languagedefinition. The compiler writer can either write the expression grammar sothat it produces the desired order or take care to generate the intermediaterepresentation to reflect the correct order and associativity, as described inSection 4.5.2.SECTION REVIEWBuilding a compiler involves more than just transcribing the grammarfrom some language definition. In writing down the grammar, manychoices arise that have an impact on both the function and the utility ofthe resulting compiler. This section dealt with a variety of issues, rangingfrom how to perform error recovery through the tradeoff between leftrecursion and right recursion.Review Questions1.
The programming language C uses square brackets to indicate anarray subscript and parentheses to indicate a procedure or functionargument list. How does this simplify the construction of a parserfor C?2. The grammar for unary absolute value introduced a new terminalsymbol as the unary operator. Consider adding a unary minus tothe classic expression grammar. Does the fact that the same terminal symbol occurs as either a unary minus or a binary minus introducecomplications? Justify your answer.3.6 ADVANCED TOPICSTo build a satisfactory parser, the compiler writer must understand the basicsof engineering a grammar and a parser. Given a working parser, there areoften ways of improving its performance. This section looks at two specificissues in parser construction.
First, we examine transformations on the grammar that reduce the length of a derivation to produce a faster parse. These148 CHAPTER 3 ParsersGoal0123456789GoalExpr→ Expr→ Expr + Term| Expr | TermTerm → Term x| Term ÷| FactorFactor → ( Expr| num| nameTermFactorFactor)Expr+ExprTermTermTermFactorFactor<name,a><name,2>(a) The Classic Expression Grammar×Factor<name,b>(b) Parse Tree for a + 2 x bn FIGURE 3.29 The Classic Expression Grammar, Revisited.ideas apply to both top-down and bottom-up parsers. Second, we discusstransformations on the grammar and the Action and Goto tables that reducetable size. These techniques apply only to lr parsers.3.6.1 Optimizing a GrammarWhile syntax analysis no longer consumes a major share of compile time,the compiler should not waste undue time in parsing.
The actual form of agrammar has a direct effect on the amount of work required to parse it. Bothtop-down and bottom-up parsers construct derivations. A top-down parserperforms an expansion for every production in the derivation. A bottomup parser performs a reduction for every production in the derivation. Agrammar that produces shorter derivations takes less time to parse.The compiler writer can often rewrite the grammar to reduce the parse treeheight.
This reduces the number of expansions in a top-down parser and thenumber of reductions in a bottom-up parser. Optimizing the grammar cannotchange the parser’s asymptotic behavior; after all, the parse tree must havea leaf node for each symbol in the input stream. Still, reducing the constantsin heavily used portions of the grammar, such as the expression grammar,can make enough difference to justify the effort.Consider, again, the classic expression grammar from Section 3.2.4.
(Thelr(1) tables for the grammar appear in Figures 3.31 and 3.32.) To enforcethe desired precedence among operators, we added two nonterminals, Termand Factor, and reshaped the grammar into the form shown in Figure 3.29a.This grammar produces rather large parse trees, even for simple expressions.For example, the expression a + 2 x b, the parse tree has 14 nodes, as shown3.6 Advanced Topics 149Goal456789101112Term →||||||||Term x ( Expr )Term x nameTerm x numTerm ÷ ( Expr )Term ÷ nameTerm ÷ num( Expr )namenum(a) New Productions for TermExprExprTerm+TermTerm<name,a><name,2>×<name,b>(b) Parse Tree for a + 2 x bn FIGURE 3.30 Replacement Productions for Term.in Figure 3.29b.
Five of these nodes are leaves that we cannot eliminate.(Changing the grammar cannot shorten the input program.)Any interior node that has only one child is a candidate for optimization. Thesequence of nodes Expr to Term to Factor to hname,ai uses four nodes for asingle word in the input stream. We can eliminate at least one layer, the layerof Factor nodes, by folding the alternative expansions for Factor into Term,as shown in Figure 3.30a. It multiplies by three the number of alternativesfor Term, but shrinks the parse tree by one layer, shown in Figure 3.30b.In an lr(1) parser, this change eliminates three of nine reduce actions, andleaves the five shifts intact.
In a top-down recursive-descent parser for anequivalent predictive grammar, it would eliminate 3 of 14 procedure calls.In general, any production that has a single symbol on its right-hand sidecan be folded away. These productions are sometimes called useless productions.
Sometimes, useless productions serve a purpose—making thegrammar more compact and, perhaps, more readable, or forcing the derivation to assume a particular shape. (Recall that the simplest of our expressiongrammars accepts a + 2 x b but does not encode any notion of precedenceinto the parse tree.) As we shall see in Chapter 4, the compiler writer mayinclude a useless production simply to create a point in the derivation wherea particular action can be performed.Folding away useless productions has its costs. In an lr(1) parser, it canmake the tables larger.
In our example, eliminating Factor removes one column from the Goto table, but the extra productions for Term increase the sizeof CC from 32 sets to 46 sets. Thus, the tables have one fewer column, butan extra 14 rows. The resulting parser performs fewer reductions (and runsfaster), but has larger tables.150 CHAPTER 3 ParsersIn a hand-coded, recursive-descent parser, the larger grammar may increasethe number of alternatives that must be compared before expanding someleft-hand side. The compiler writer can sometimes compensate for theincreased cost by combining cases. For example, the code for both nontrivialexpansions of Expr 0 in Figure 3.10 is identical. The compiler writer couldcombine them with a test that matches word against either + or -.
Alternatively, the compiler writer could assign both + and - to the same syntacticcategory, have the parser inspect the syntactic category, and use the lexemeto differentiate between the two when needed.3.6.2 Reducing the Size of LR(1) TablesUnfortunately, the lr(1) tables generated for relatively small grammarscan be large.
Figures 3.31 and 3.32 show the canonical lr(1) tables forthe classic expression grammar. Many techniques exist for shrinking suchtables, including the three approaches to reducing table size described inthis section.Combining Rows or ColumnsIf the table generator can find two rows, or two columns, that are identical,it can combine them. In Figure 3.31, the rows for states 0 and 7 through 10are identical, as are rows 4, 14, 21, 22, 24, and 25. The table generator canimplement each of these sets once, and remap the states accordingly. Thiswould remove nine rows from the table, reducing its size by 28 percent.
Touse this table, the skeleton parser needs a mapping from a parser state toa row index in the Action table. The table generator can combine identical columns in the analogous way. A separate inspection of the Goto tablewill yield a different set of state combinations—in particular, all of the rowscontaining only zeros should condense to a single row.In some cases, the table generator can prove that two rows or two columnsdiffer only in cases where one of the two has an “error” entry (denoted by ablank in our figures).
In Figure 3.31, the columns for eof and for num differonly where one or the other has a blank. Combining such columns producesthe same behavior on correct inputs. It does change the parser’s behavior onerroneous inputs and may impede the parser’s ability to provide accurate andhelpful error messages.Combining rows and columns produces a direct reduction in table size. If thisspace reduction adds an extra indirection to every table access, the cost ofthose memory operations must trade off directly against the savings in memory. The table generator could also use other techniques to represent sparsematrices—again, the implementor must consider the tradeoff of memory sizeagainst any increase in access costs.3.6 Advanced Topics 151Action TableState012345678910111213141516171819202122232425262728293031eof+−accr4r7s7r4r7s8r4r7s9r7s 10r7r9r 10r9r 10r9r 10r9r 10r9r 10×÷s 21r4r7s 22r4r7s 24r7s 25r7r9r 10r2r3r5r6r9r 10r2r3r5r6r9r 10s9s9r5r6r9r 10s 10s 10r5r6()numnames4s5s6s 14s 15s 16s4s4s4s4s5s5s5s5s6s6s6s6s 15s 16s 14s 14s 15s 15s 16s 16s 14s 14s 15s 15s 16s 16s 23r4r7s 14r2r3r5r6r8r8s 21r2r3r5r6r8r8s 22r2r3r5r6r8r8s 24s 24r5r6r8r9r 10r8s 25s 25r5r6r8s 31r2r3r5r6r8n FIGURE 3.31 Action Table for the Classic Expression Grammar.Shrinking the GrammarIn many cases, the compiler writer can recode the grammar to reduce thenumber of productions it contains.
This usually leads to smaller tables. Forexample, in the classic expression grammar, the distinction between a number and an identifier is irrelevant to the productions for Goal, Expr, Term,and Factor. Replacing the two productions Factor → num and Factor →152 CHAPTER 3 ParsersGoto TableGoto TableStateExprTermFactorState0123456789101112131415123111213161718192021222324252627282930312617183319201213ExprTermFactor272813132930n FIGURE 3.32 Goto Table for the Classic Expression Grammar.name with a single production Factor → val shrinks the grammar by a production. In the Action table, each terminal symbol has its own column.Folding num and name into a single symbol, val, removes a column fromthe Action table.