A.W. Appel, J. Palsberg - Modern Compiler Implementation in Java (Second Edition), страница 15
Описание файла
PDF-файл из архива "A.W. Appel, J. Palsberg - Modern Compiler Implementation in Java (Second Edition)", который расположен в категории "". Всё это находится в предмете "конструирование компиляторов" из 7 семестр, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст 15 страницы из PDF
This is what you see on the stack in the error message in Figure 3.33. So on thestack we have tokens for if, id, then, and a production that matches a stm, and now we havean else token. Clearly this reveals that the conflict is caused by the familiar dangling else.In order to resolve this conflict we need to rewrite the grammar, removing the ambiguity as inGrammar 3.34.GRAMMAR 3.34: SableCC productions of Grammar 3.32 with conflicts resolved.Productionsprog = stmlist;stm = {stm_without_trailing_substm}stm_without_trailing_substm |{while} while id do stm |{if_then} if id then stm |{if_then_else} if id then stm_no_short_ifelse [false_stm]:stm;stm_no_short_if = {stm_without_trailing_substm}stm_without_trailing_substm |{while_no_short_if}while id do stm_no_short_if |{if_then_else_no_short_if}if id then [true_stm]:stm_no_short_ifelse [fals_stm]:stm_no_short_if;stm_without_trailing_substm = {assign} [left]:id assign [right]:id |{begin} begin stmlist end ;stmlist = {stmt} stm | {stmtlist} stmlist semi stm;PRECEDENCE DIRECTIVESNo ambiguous grammar is LR(k) for any k; the LR(k) parsing table of an ambiguous grammarwill always have conflicts.
However, ambiguous grammars can still be useful if we can findways to resolve the conflicts.For example, Grammar 3.5 is highly ambiguous. In using this grammar to describe aprogramming language, we intend it to be parsed so that * and = bind more tightly than + and−, and that each operator associates to the left. We can express this by rewriting theunambiguous Grammar 3.8.But we can avoid introducing the T and F symbols and their associated "trivial" reductions E→ T and T → F. Instead, let us start by building the LR(1) parsing table for Grammar 3.5, as69shown in Table 3.35. We find many conflicts. For example, in state 13 with lookahead + wefind a conflict between shift into state 8 and reduce by rule 3. Two of the items in state 13 areTable 3.35: LR parsing table for Grammar 3.5.In this state the top of the stack is … E * E.
Shifting will lead to a stack … E * E+ andeventually … E * E + E with a reduction of E + E to E. Reducing now will lead to the stack… E and then the + will be shifted. The parse trees obtained by shifting and reducing areIf we wish * to bind tighter than +, we should reduce instead of shift. So we fill the (13, +)entry in the table with r3 and discard the s8 action.Conversely, in state 9 on lookahead *, we should shift instead of reduce, so we resolve theconflict by filling the (9, *) entry with s12.The case for state 9, lookahead + isShifting will make the operator right-associative; reducing will make it leftassociative.
Sincewe want left associativity, we fill (9, +) with r5.70Consider the expression a − b − c. In most programming languages, this associates to theleft, as if written (a − b) − c. But suppose we believe that this expression is inherentlyconfusing, and we want to force the programmer to put in explicit parentheses, either (a − b)− c or a − (b − c). Then we say that the minus operator is nonassociative, and we would fillthe (11, −) entry with an error entry.The result of all these decisions is a parsing table with all conflicts resolved (Table 3.36).Table 3.36: Conflicts of Table 3.35 resolved.Yacc has precedence directives to indicate the resolution of this class of shift-reduce conflicts.(Unfortunately, SableCC does not have precedence directives.) A series of declarations suchasprecedenceprecedenceprecedenceprecedencenonassoc EQ, NEQ;left PLUS, MINUS;left TIMES, DIV;right EXP;indicates that + and - are left-associative and bind equally tightly; that * and / are leftassociative and bind more tightly than +; that ⁁ is right-associative and binds most tightly; andthat = and ≠ are nonassociative, and bind more weakly than +.In examining a shift-reduce conflict such asthere is the choice of shifting a token and reducing by a rule.
Should the rule or the token begiven higher priority? The precedence declarations (precedence left, etc.) give priorities tothe tokens; the priority of a rule is given by the last token occurring on the right-hand side ofthat rule. Thus the choice here is between a rule with priority * and a token with priority +; therule has higher priority, so the conflict is resolved in favor of reducing.When the rule and token have equal priority, then a left precedence favors reducing, rightfavors shifting, and nonassoc yields an error action.71Instead of using the default "rule has precedence of its last token", we can assign a specificprecedence to a rule using the %prec directive.
This is commonly used to solve the "unaryminus" problem. In most programming languages a unary minus binds tighter than any binaryoperator, so −6 * 8 is parsed as (−6) * 8, not −(6 * 8). Grammar 3.37 shows an example.GRAMMAR 3.37: Yacc grammar with precedence directives.%{ declarations of yylex and yyerror %}%token INT PLUS MINUS TIMES UMINUS%start exp%left PLUS MINUS%left TIMES%left UMINUS%%exp :||||INTexp PLUS expexp MINUS expexp TIMES expMINUS exp%prec UMINUSThe token UMINUS is never returned by the lexer; it's just a placeholder in the chain ofprecedence declarations.
The directive %prec UMINUS gives the rule exp::= MINUS exp thehighest precedence, so reducing by this rule takes precedence over shifting any operator, evena minus sign.Precedence rules are helpful in resolving conflicts, but they should not be abused. If you havetrouble explaining the effect of a clever use of precedence rules, perhaps instead you shouldrewrite the grammar to be unambiguous.SYNTAX VERSUS SEMANTICSConsider a programming language with arithmetic expressions such as x + y and booleanexpressions such as x + y = z or a&(b = c). Arithmetic operators bind tighter than the booleanoperators; there are arithmetic variables and boolean variables; and a boolean expressioncannot be added to an arithmetic expression. Grammar 3.38 gives a syntax for this language.GRAMMAR 3.38: Yacc grammar with precedence directives.%token ID ASSIGN PLUS MINUS AND EQUAL%start stm%left OR%left AND%left PLUS%%stm : ID ASSIGN ae| ID ASSIGN bebe:|||be OR bebe AND beae EQUAL aeID72ae: ae PLUS ae| IDThe grammar has a reduce-reduce conflict.
How should we rewrite the grammar to eliminatethis conflict?Here the problem is that when the parser sees an identifier such as a, it has no way ofknowing whether this is an arithmetic variable or a boolean variable - syntactically they lookidentical. The solution is to defer this analysis until the "semantic" phase of the compiler; it'snot a problem that can be handled naturally with context-free grammars.
A more appropriategrammar is•S → id : E•E → id•E→E&E•E→E=E•E→E+ENow the expression a + 5&b is syntactically legal, and a later phase of the compiler will haveto reject it and print a semantic error message.3.5 ERROR RECOVERYLR(k) parsing tables contain shift, reduce, accept, and error actions. On page 58 we claimedthat when an LR parser encounters an error action it stops parsing and reports failure. Thisbehavior would be unkind to the programmer, who would like to have all the errors in herprogram reported, not just the first error.RECOVERY USING THE ERROR SYMBOLLocal error recovery mechanisms work by adjusting the parse stack and the input at the pointwhere the error was detected in a way that will allow parsing to resume. One local recoverymechanism - found in many versions of the Yacc parser generator - uses a special errorsymbol to control the recovery process.
Wherever the special error symbol appears in agrammar rule, a sequence of erroneous input tokens can be matched.For example, in a Yacc grammar we might have productions such as•••••exp → IDexp → exp + extexp → ( exps )exps → expexps → exps ; expInformally, we can specify that if a syntax error is encountered in the middle of an expression,the parser should skip to the next semicolon or right parenthesis (these are called73synchronizing tokens) and resume parsing. We do this by adding error-recovery productionssuch as••exp → ( error )exps → error ; expWhat does the parser generator do with the error symbol? In parser generation, error isconsidered a terminal symbol, and shift actions are entered in the parsing table for it as if itwere an ordinary token.When the LR parser reaches an error state, it takes the following actions:1.
Pop the stack (if necessary) until a state is reached in which the action for the errortoken is shift.2. Shift the error token.3. Discard input symbols (if necessary) until a lookahead is reached that has a nonerroraction in the current state.4. Resume normal parsing.In the two error productions illustrated above, we have taken care to follow the error symbolwith an appropriate synchronizing token - in this case, a right parenthesis or semicolon.
Thus,the "nonerror action" taken in step 3 will always shift. If instead we used the production exp→ error, the "nonerror action" would be reduce, and (in an SLR or LALR parser) it ispossible that the original (erroneous) lookahead symbol would cause another error after thereduce action, without having advanced the input. Therefore, grammar rules that contain errornot followed by a token should be used only when there is no good alternative.Caution One can attach semantic actions to Yacc grammar rules; whenever a rule is reduced,its semantic action is executed.
Chapter 4 explains the use of semantic actions.Popping states from the stack can lead to seemingly "impossible" semantic actions,especially if the actions contain side effects. Consider this grammar fragment:statements: statements exp SEMICOLON| statements error SEMICOLON| /* empty */exp : increment exp decrement|IDincrement: LPARENdecrement: RPAREN{: nest=nest+1; :}{: nest=nest-1; :}"Obviously" it is true that whenever a semicolon is reached, the value of nest is zero, becauseit is incremented and decremented in a balanced way according to the grammar ofexpressions. But if a syntax error is found after some left parentheses have been parsed, thenstates will be popped from the stack without "completing" them, leading to a nonzero value ofnest.