Главная » Все файлы » Просмотр файлов из архивов » PDF-файлы » Cooper_Engineering_a_Compiler(Second Edition)

Cooper_Engineering_a_Compiler(Second Edition) (Rice), страница 24

Rice 1869

Описание файла

Файл "Cooper_Engineering_a_Compiler(Second Edition)" внутри архива находится в следующих папках: Rice, Купер и Торчсон - перевод. PDF-файл из архива "Rice", который расположен в категории "разное". Всё это находится в предмете "конструирование компиляторов" из седьмого семестра, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .

Просмотр PDF-файла онлайн

Текст 24 страницы из PDF

To keep it from decrementing the pointer beyond thestart of that context, NextChar and RollBack cooperate. The pointer Fencealways indicates the start of the valid context. NextChar sets Fence eachtime it fills a buffer. RollBack checks Fence each time it tries to decrementthe Input pointer.After a long series of NextChar operations, say, more than n of them, RollBack can always back up at least n characters.

However, a sequence of callsto NextChar and RollBack that work forward and backward in the buffercan create a situation where the distance between Input and Fence is lessthan n. Larger values of n decrease the likelihood of this situation arising.Expected backup distances should be a consideration in selecting the buffersize, n.Generating LexemesThe code shown for the table-driven and direct-coded scanners accumulatedthe input characters into a string lexeme. If the appropriate output for eachsyntactic category is a textual copy of the lexeme, then those schemes areefficient. In some common cases, however, the parser, which consumes thescanner’s output, needs the information in another form.For example, in many circumstances, the natural representation for a register number is an integer, rather than a character string consisting of an ‘r’and a sequence of digits.

If the scanner builds a character representation,then somewhere in the interface, that string must be converted to an integer. A typical way to accomplish that conversion uses a library routine,such as atoi in the standard C library, or a string-based i/o routine, such as72 CHAPTER 2 Scannerssscanf. A more efficient way to solve this problem would be to accumulatethe integer’s value one digit at a time.In the continuing example, the scanner could initialize a variable, RegNum,to zero in its initial state. Each time that it recognized a digit, it couldmultiply RegNum by 10 and add the new digit.

When it reached an accepting state, RegNum would contain the needed value. To modify the scannerin Figure 2.16, we can delete all statements that refer to lexeme, addRegNum ← 0; to sinit , and replace the occurrences of goto s2 in statess1 and s2 with:begin;RegNum ← RegNum × 10 + (char - ‘0’);goto s2 ;end;where both char and ‘0’ are treated as their ordinal values in the asciicollating sequence. Accumulating the value this way likely has loweroverhead than building the string and converting it in the accepting state.For other words, the lexeme is implicit and, therefore, redundant. Withsingleton words, such as a punctuation mark or an operator, the syntacticcategory is equivalent to the lexeme.

Similarly, many scanners recognizecomments and white space and discard them. Again, the set of states thatrecognize the comment need not accumulate the lexeme. While the individual savings are small, the aggregate effect is to create a faster, more compactscanner.This issue arises because many scanner generators let the compiler writerspecify actions to be performed in an accepting state, but do not allowactions on each transition.

The resulting scanners must accumulate acharacter copy of the lexeme for each word, whether or not that copy isneeded. If compile time matters (and it should), then attention to such minoralgorithmic details leads to a faster compiler.2.5.4 Handling KeywordsWe have consistently assumed that keywords in the input language shouldbe recognized by including explicit res for them in the description thatgenerates the dfa and the recognizer. Many authors have proposed an alternative strategy: having the dfa classify them as identifiers and testing eachidentifier to determine whether or not it is a keyword.This strategy made sense in the context of a hand-implemented scanner.The additional complexity added by checking explicitly for keywords causes2.5 Implementing Scanners 73a significant expansion in the number of dfa states. This added implementation burden matters in a hand-coded program.

With a reasonable hash table(see Appendix B.4), the expected cost of each lookup should be constant.In fact, this scheme has been used as a classic application for perfect hashing. In perfect hashing, the implementor ensures, for a fixed set of keys, thatthe hash function generates a compact set of integers with no collisions.

Thislowers the cost of lookup on each keyword. If the table implementation takesinto account the perfect hash function, a single probe serves to distinguishkeywords from identifiers. If it retries on a miss, however, the behavior canbe much worse for nonkeywords than for keywords.If the compiler writer uses a scanner generator to construct the recognizer,then the added complexity of recognizing keywords in the dfa is handled bythe tools.

The extra states that this adds consume memory, but not compiletime. Using the dfa mechanism to recognize keywords avoids a table lookupon each identifier. It also avoids the overhead of implementing a keywordtable and its support functions. In most cases, folding keyword recognitioninto the dfa makes more sense than using a separate lookup table.SECTION REVIEWAutomatic construction of a working scanner from a minimal DFA isstraightforward.

The scanner generator can adopt a table-drivenapproach, wherein it uses a generic skeleton scanner and languagespecific tables, or it can generate a direct-coded scanner that threadstogether a code fragment for each DFA state. In general, the direct-codedapproach produces a faster scanner because it has lower overhead percharacter.Despite the fact that all DFA-based scanners have small constant costsper characters, many compiler writers choose to hand code a scanner.This approach lends itself to careful implementation of the interfacesbetween the scanner and the I/O system and between the scanner andthe parser.aReview Questions1.

Given the DFA shown to the left, complete the following:a. Sketch the character classifier that you would use in a table-drivenimplementation of this DFA.b. Build the transition table, based on the transition diagram andyour character classifier.c. Write an equivalent direct-coded scanner.s1bs4cs7s0bs2cs5as8cs3as6bs974 CHAPTER 2 Scanners2. An alternative implementation might use a recognizer for(a|b|c) (a|b|c) (a|b|c), followed by a lookup in a table that containsthe three words abc, bca, and cab.a. Sketch the DFA for this language.b.

Show the direct-coded scanner, including the call needed toperform keyword lookup.c. Contrast the cost of this approach with those in question 1 above.3. What impact would the addition of transition-by-transition actionshave on the DFA-minimization process? (Assume that we have a linguistic mechanism of attaching code fragments to the edges in thetransition graph.)2.6 ADVANCED TOPICS2.6.1 DFA to Regular ExpressionThe final step in the cycle of constructions, shown in Figure 2.3, is toconstruct an re from a dfa.

The combination of Thompson’s constructionand the subset construction provide a constructive proof that dfas are atleast as powerful as res. This section presents Kleene’s construction, whichbuilds an re to describe the set of strings accepted by an arbitrary dfa. Thisalgorithm establishes that res are at least as powerful as dfas. Together, theyshow that res and dfas are equivalent.Consider the transition diagram of a dfa as a graph with labelled edges.The problem of deriving an re that describes the language accepted by thedfa corresponds to a path problem over the dfa’s transition diagram. Theset of strings in L(dfa) consists of the set of edge labels for every pathfrom d0 to di , ∀ di ∈ D A .

For any dfa with a cyclic transition graph, the setof such paths is infinite. Fortunately, res have the Kleene closure operatorto handle this case and summarize the complete set of subpaths created bya cycle.Figure 2.18 shows one algorithm to compute this path expression. It assumesthat the dfa has states numbered from 0 to |D| − 1, with d0 as the start state.It generates an expression that represents the labels along all paths betweentwo nodes, for each pair of nodes in the transition diagram.

As a final step,it combines the expressions for each path that leaves d0 and reaches someaccepting state, di ∈ D A . In this way, it systematically constructs the pathexpressions for all paths.The algorithm computes a set of expressions, denoted Rkij , for all the relevantvalues of i, j, and k. Rkij is an expression that describes all paths throughthe transition graph from state i to state j, without going through a state2.6 Advanced Topics 75for i = 0 to |D| − 1for j = 0 to |D| − 11R−ij = { a | δ(d i , a) = d j }if (i = j) then1−1R−ij = Rij | { }for k = 0 to |D|−1for i = 0 to |D |−1for j = 0 to |D|−1Rkij = Rkik−1 (Rkkk−1 )∗ Rkkj−1 | Rkij−1L = |s ∈ DjA|D|−1R0jn FIGURE 2.18 Deriving a Regular Expression from a DFA.numbered higher than k.

Here, through means both entering and leaving, sothat R21, 16 can be nonempty if an edge runs directly from 1 to 16.Initially, the algorithm places all of the direct paths from i to j in Rij−1 , with{} added to Rij−1 if i = j. Over successive iterations, it builds up longer pathsto produce Rijk by adding to Rkij−1 the paths that pass through k on their wayfrom i to j. Given Rkij−1 , the set of paths added by going from k − 1 to k isexactly the set of paths that run from i to k using no state higher than k − 1,concatenated with the paths from k to itself that pass through no state higherthan k − 1, followed by the paths from k to j that pass through no state higherthan k − 1. That is, each iteration of the loop on k adds the paths that passthrough k to each set Rkij−1 to produce Rijk .When the k loop terminates, the various Rkij expressions account for all pathsthrough the graph.

Свежие статьи
Популярно сейчас