Главная » Все файлы » Просмотр файлов из архивов » PDF-файлы » Cooper_Engineering_a_Compiler(Second Edition)

Cooper_Engineering_a_Compiler(Second Edition) (Rice), страница 38

Rice 1869

Описание файла

Файл "Cooper_Engineering_a_Compiler(Second Edition)" внутри архива находится в следующих папках: Rice, Купер и Торчсон - перевод. PDF-файл из архива "Rice", который расположен в категории "разное". Всё это находится в предмете "конструирование компиляторов" из седьмого семестра, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .

Просмотр PDF-файла онлайн

Текст 38 страницы из PDF

It creates new sets forStmt, for if, and for assign.(cc6 =[Stmt → if expr then Stmt •, eof ],[Stmt → if expr then Stmt • else Stmt, eof ])([Stmt → if • expr then Stmt,{eof, else}],cc7 =[Stmt → if • expr then Stmt else Stmt, {eof, else}])cc8 = {[Stmt → assign •, {eof, else}]}The fifth iteration examines cc6 , cc7 , and cc8 . While most of the combinations produce the empty set, two combinations lead to new sets. Thetransition on else from cc6 leads to cc9 , and the transition on expr fromcc7 creates cc10 .[Stmt → if expr then Stmt else • Stmt, eof ],[Stmt → • if expr then Stmt, eof ],cc9 =[Stmt → • if expr then Stmt else Stmt, eof ],[Stmt → • assign, eof ]()[Stmt → if expr • then Stmt, {eof, else}],cc10 =[Stmt → if expr • then Stmt else Stmt, {eof, else}]When the sixth iteration examines the sets produced in the fifth iteration, itcreates two new sets, cc11 from cc9 on Stmt and cc12 from cc10 on then.

Italso creates duplicate sets for cc2 and cc3 from cc9 .cc11 = {[Stmt → if expr then Stmt else Stmt •, eof ]}cc12[Stmt → if expr then • Stmt, {eof, else}],[Stmt → if expr then • Stmt else Stmt, {eof, else}],= [Stmt → • if expr then Stmt, {eof, else}],[Stmt → • if expr then Stmt else Stmt, {eof, else}], [Stmt → • assign, {eof, else}]3.4 Bottom-Up Parsing 139Iteration seven creates cc13 from cc12 on Stmt. It recreates cc7 and cc8 .(cc13 =)[Stmt → if expr then Stmt • , {eof, else}],[Stmt → if expr then Stmt • else Stmt, {eof, else}]Iteration eight finds one new set, cc14 from cc13 on the transition for else.cc14[Stmt → if expr then Stmt else • Stmt, {eof, else}],[Stmt → • if expr then Stmt, {eof, else}],=[Stmt → • if expr then Stmt else Stmt, {eof, else}],[Stmt → • assign, {eof, else}]Iteration nine generates cc15 from cc14 on the transition for Stmt, along withduplicates of cc7 and cc8 .cc15 = {[Stmt → if expr then Stmt else Stmt •, {eof, else}]}The final iteration looks at cc15 .

Since the • lies at the end of every itemin cc15 , it can only generate empty sets. At this point, no additional sets ofitems can be added to the canonical collection, so the algorithm has reacheda fixed point. It halts.The ambiguity in the grammar becomes apparent during the table-fillingalgorithm. The items in states cc0 through cc12 generate no conflicts.

Statecc13 contains four items:[Stmt → if[Stmt → if[Stmt → if[Stmt → ifexpr then Stmt • , else]expr then Stmt • , eof ]expr then Stmt • else Stmt, else]expr then Stmt • else Stmt, eof ]Item 1 generates a reduce entry for cc13 and the lookahead else. Item 3generates a shift entry for the same location in the table. Clearly, the tableentry cannot hold both actions. This shift-reduce conflict indicates that thegrammar is ambiguous. Items 2 and 4 generate a similar shift-reduce conflictwith a lookahead of eof.

When the table-filling algorithm encounters sucha conflict, the construction has failed. The table generator should report theproblem—a fundamental ambiguity between the productions in the specificlr(1) items—to the compiler writer.In this case, the conflict arises because production 2 in the grammar is aprefix of production 3. The table generator could be designed to resolve thisconflict in favor of shifting; that forces the parser to recognize the longerproduction and binds the else to the innermost if.A typical error message from a parser generatorincludes the LR(1) items that generate theconflict; another reason to study the tableconstruction.140 CHAPTER 3 ParsersAn ambiguous grammar can also produce a reduce-reduce conflict.

Sucha conflict can occur if the grammar contains two productions A→γ δ andB →γ δ, with the same right-hand side γ δ. If a state contains the items[A→γ δ •,a] and [B →γ δ •,a], then it will generate two conflicting reduceactions for the lookahead a—one for each production. Again, this conflictreflects a fundamental ambiguity in the underlying grammar; the compilerwriter must reshape the grammar to eliminate it (see Section 3.5.3).Since parser generators that automate this process are widely available, themethod of choice for determining whether a grammar has the lr(1) propertyis to invoke an lr(1) parser generator on it. If the process succeeds, thegrammar has the lr(1) property.Exercise 12 shows an LR(1) grammar that has noequivalent LL(1) grammar.As a final example, the LR tables for the classicexpression grammar appear in Figures 3.31and 3.32 on pages 151 and 152.SECTION REVIEWLR(1) parsers are widely used in compilers built in both industry andacademia.

These parsers accept a large class of languages. They usetime proportional to the size of the derivation that they construct. Toolsthat generate an LR(1) parser are widely available in a broad variety ofimplementation languages.The LR(1) table-construction algorithm is an elegant application of theoryto practice.

It systematically builds up a model of the handle-recognizingDFA and then translates that model into a pair of tables that drive theskeleton parser. The table construction is a complex undertaking thatrequires painstaking attention to detail. It is precisely the kind of task thatshould be automated—parser generators are better at following theselong chains of computations than are humans. That notwithstanding,a skilled compiler writer should understand the table-constructionalgorithms because they provide insight into how the parsers work, whatkinds of errors the parser generator can encounter, how those errorsarise, and how they can be remedied.Review Questions1.

Show the steps that the skeleton LR(1) parser, with the tables for theparentheses grammar, would take on the input string “( ( ) ( ) ) ( ) .”2. Build the LR(1) tables for the SheepNoise grammar, given inSection 3.2.2 on page 86, and show the skeleton parser’s actions onthe input “baa baa baa.”3.5 Practical Issues 1413.5 PRACTICAL ISSUESEven with automatic parser generators, the compiler writer must manageseveral issues to produce a robust, efficient parser for a real programminglanguage. This section addresses several issues that arise in practice.3.5.1 Error RecoveryProgrammers often compile code that contains syntax errors. In fact, compilers are widely accepted as the fastest way to discover such errors. In thisapplication, the compiler must find as many syntax errors as possible in asingle attempt at parsing the code.

This requires attention to the parser’sbehavior in error states.All of the parsers shown in this chapter have the same behavior when theyencounter a syntax error: they report the problem and halt. This behaviorprevents the compiler from wasting time trying to translate an incorrect program. However, it ensures that the compiler finds at most one syntax errorper compilation. Such a compiler would make finding all the syntax errorsin a file of program text a potentially long and painful process.A parser should find as many syntax errors as possible in each compilation.This requires a mechanism that lets the parser recover from an error by moving to a state where it can continue parsing. A common way of achieving thisis to select one or more words that the parser can use to synchronize the inputwith its internal state.

When the parser encounters an error, it discards inputsymbols until it finds a synchronizing word and then resets its internal stateto one consistent with the synchronizing word.In an Algol-like language, with semicolons as statement separators, thesemicolon is often used as a synchronizing word. When an error occurs,the parser calls the scanner repeatedly until it finds a semicolon. It thenchanges state to one that would have resulted from successful recognitionof a complete statement, rather than an error.In a recursive-descent parser, the code can simply discard words until it findsa semicolon. At that point, it can return control to the point where the routinethat parses statements reports success.

This may involve manipulating theruntime stack or using a nonlocal jump like C’s setjmp and longjmp.In an lr(1) parser, this kind of resynchronization is more complex. Theparser discards input until it finds a semicolon. Next, it scans backward downthe parse stack until it finds a state s such that Goto[s, Statement] is a valid,nonerror entry. The first such state on the stack represents the statement that142 CHAPTER 3 Parserscontains the error. The error recovery routine then discards entries on thestack above that state, pushes the state Goto[s, Statement] onto the stack andresumes normal parsing.In a table-driven parser, either ll(1) or lr(1), the compiler needs a wayof telling the parser generator where to synchronize. This can be doneusing error productions—a production whose right-hand side includes areserved word that indicates an error synchronization point and one ormore synchronizing tokens. With such a construct, the parser generator canconstruct error-recovery routines that implement the desired behavior.Of course, the error-recovery routines should take steps to ensure that thecompiler does not try to generate and optimize code for a syntacticallyinvalid program.

Свежие статьи
Популярно сейчас