Cooper_Engineering_a_Compiler(Second Edition) (Rice), страница 26

Rice 1869

Описание файла

Файл "Cooper_Engineering_a_Compiler(Second Edition)" внутри архива находится в следующих папках: Rice, Купер и Торчсон - перевод. PDF-файл из архива "Rice", который расположен в категории "разное". Всё это находится в предмете "конструирование компиляторов" из седьмого семестра, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .

Просмотр PDF-файла онлайн

Текст 26 страницы из PDF

The subset construction derives fromRabin and Scott [292]. The dfa minimization algorithm in Section 2.4.4is due to Hopcroft [193]. It has found application to many different problems, including detecting when two program variables always have the samevalue [22].The idea of generating code rather than tables, to produce a direct-codedscanner, appears to originate in work by Waite [340] and Heuring [189].They report a factor of five improvement over table-driven implementations.Ngassam et al.

describe experiments that characterize the speedups possiblein hand-coded scanners [274]. Several authors have examined tradeoffs inscanner implementation. Jones [208] advocates direct coding but argues fora structured approach to control flow rather than the spaghetti code shownin Section 2.5.2. Brouwer et al. compare the speed of 12 different scanner implementations; they discovered a factor of 70 difference between thefastest and slowest implementations [59].The alternative dfa minimization technique presented in Section 2.6.2was described by Brzozowski in 1962 [60]. Several authors have compared dfa minimization techniques and their performance [328, 344].

Manyauthors have looked at the construction and minimization of acyclic dfas[112, 343, 345].80 CHAPTER 2 ScannersnSection 2.2EXERCISES1. Describe informally the languages accepted by the following fas:aa.s0a,bs2babs10b.s1010s0s21s0abs10,1aa1bc.s3as2bs3bbs4abs5as62. Construct an fa accepting each of the following languages:a. {w ∈ {a, b}∗ | w starts with ‘a’ and contains ‘baba’ as a substring}b.

{w ∈ {0, 1}∗ | w contains ‘111’ as a substring and does not contain‘00’ as a substring}c. {w ∈ {a, b, c}∗ | in w the number of ‘a’s modulo 2 is equal to thenumber of ‘b’s modulo 3}3. Create fas to recognize (a) words that represent complex numbers and(b) words that represent decimal numbers written in scientificnotation.Section 2.3HintNot all the specifications describe regularlanguages.4. Different programming languages use different notations to representintegers.

Construct a regular expression for each one of the following:a. Nonnegative integers in c represented in bases 10 and 16.b. Nonnegative integers in vhdl that may include underscores(an underscore cannot occur as the first or last character).c. Currency, in dollars, represented as a positive decimal numberrounded to the nearest one-hundredth. Such numbers begin withthe character $, have commas separating each group of three digitsto the left of the decimal point, and end with two digits to the rightof the decimal point, for example, $8,937.43 and $7,777,777.77.5.

Write a regular expression for each of the following languages:a. Given an alphabet 6 = {0, 1}, L is the set of all strings ofalternating pairs of 0s and pairs of 1s.a,bExercises 81b. Given an alphabet 6 = {0, 1}, L is the set of all strings of 0s and 1sthat contain an even number of 0s or an even number of 1s.c. Given the lowercase English alphabet, L is the set of all strings inwhich the letters appear in ascending lexicographical order.d. Given an alphabet 6 = {a, b, c, d}, L is the set of strings xyzwy,where x and w are strings of one or more characters in 6, y is anysingle character in 6, and z is the character z, taken from outsidethe alphabet.

(Each string xyzwy contains two words xy and wybuilt from letters in 6. The words end in the same letter, y. Theyare separated by z.)e. Given an alphabet 6 = {+, −, ×, ÷, (, ), id}, L is the set ofalgebraic expressions using addition, subtraction, multiplication,division, and parentheses over ids.6.

Write a regular expression to describe each of the followingprogramming language constructs:a. Any sequence of tabs and blanks (sometimes called white space)b. Comments in the programming language cc. String constants (without escape characters)d. Floating-point numbers7. Consider the three regular expressions:(ab | ac)∗(0 | 1)∗ 1100 1∗(01 | 10 | 00)∗ 11a. Use Thompson’s construction to construct an nfa for each re.b.

Convert the nfas to dfas.c. Minimize the dfas.8. One way of proving that two res are equivalent is to construct theirminimized dfas and then compare them. If they differ only by statenames, then the res are equivalent. Use this technique to check thefollowing pairs of res and state whether or not they areequivalent.a. (0 | 1)∗ and (0∗ | 10∗ )∗b. (ba)+ (a∗ b∗ | a∗ ) and (ba)∗ ba+ (b∗ | )9. In some cases, two states connected by an -move can be combined.a. Under what set of conditions can two states connected by an-move be combined?b.

Give an algorithm for eliminating -moves.Section 2.482 CHAPTER 2 Scannersc. How does your algorithm relate to the -closure function used toimplement the subset construction?10. Show that the set of regular languages is closed under intersection.11. The dfa minimization algorithm given in Figure 2.9 is formulated toenumerate all the elements of P and all of the characters in 6 on eachiteration of the while loop.a.

Recast the algorithm so that it uses a worklist to hold the sets thatmust still be examined.b. Recast the Split function so that it partitions the set around all ofthe characters in 6.c. How does the expected case complexity of your modifiedalgorithms compare to the expected case complexity of the originalalgorithm?Section 2.512. Construct a dfa for each of the following c language constructs, andthen build the corresponding table for a table-driven implementationfor each of them:a. Integer constantsb. Identifiersc. Comments13.

For each of the dfas in the previous exercise, build a direct-codedscanner.14. This chapter describes several styles of dfa implementations. Anotheralternative would use mutually recursive functions to implement ascanner. Discuss the advantages and disadvantages of such animplementation.15. To reduce the size of the transition table, the scanner generator can usea character classification scheme.

Generating the classifier table,however, seems expensive. The obvious algorithm would requireO(|6|2 · | states|) time. Derive an asymptotically faster algorithm forfinding identical columns in the transition table.16. Figure 2.15 shows a scheme that avoids quadratic roll back behaviorin a scanner built by simulating a dfa.

Unfortunately, that schemerequires that the scanner know in advance the length of the inputstream and that it maintain a bit-matrix, Failed, of size|states| × |input|. Devise a scheme that avoids the need to know thesize of the input stream in advance. Can you use the same scheme toreduce the size of the Failed table in cases where the worst case inputdoes not occur?Chapter3ParsersnCHAPTER OVERVIEWThe parser’s task is to determine if the input program, represented by thestream of classified words produced by the scanner, is a valid sentence in theprogramming language. To do so, the parser attempts to build a derivationfor the input program, using a grammar for the programming language.This chapter introduces context-free grammars, a notation used to specifythe syntax of programming languages. It develops several techniques forfinding a derivation, given a grammar and an input program.Keywords: Parsing, Grammar, ll(1), lr(1), Recursive Descent3.1 INTRODUCTIONParsing is the second stage of the compiler’s front end.

The parser workswith the program as transformed by the scanner; it sees a stream of wordswhere each word is annotated with a syntactic category (analogous to its partof speech). The parser derives a syntactic structure for the program, fittingthe words into a grammatical model of the source programming language.If the parser determines that the input stream is a valid program, it builds aconcrete model of the program for use by the later phases of compilation. Ifthe input stream is not a valid program, the parser reports the problem andappropriate diagnostic information to the user.As a problem, parsing has many similarities to scanning.

The formal problem has been studied extensively as part of formal language theory; thatwork forms the theoretical basis for the practical parsing techniques used inmost compilers. Speed matters; all of the techniques that we will study taketime proportional to the size of the program and its representation.

Lowlevel detail affects performance; the same implementation tradeoffs ariseEngineering a Compiler. DOI: 10.1016/B978-0-12-088478-0.00003-7c 2012, Elsevier Inc. All rights reserved.Copyright 8384 CHAPTER 3 Parsersin parsing as in scanning. The techniques in this chapter are amenable toimplementation as table-driven parsers, direct-coded parsers, and handcoded parsers. Unlike scanners, where hand-coding is common, toolgenerated parsers are more common than hand-coded parsers.Conceptual RoadmapThe primary task of the parser is to determine whether or not the input program is a syntactically valid sentence in the source language. Before we canbuild parsers that answer this question, we need both a formal mechanismfor specifying the syntax of the source language and a systematic method ofdetermining membership in this formally specified language.

By restrictingthe form of the source language to a set of languages called context-free languages, we can ensure that the parser can efficiently answer the membershipquestion. Section 3.2 introduces context-free grammars (cfgs) as a notationfor specifying syntax.Many algorithms have been proposed to answer the membership questionfor cfgs. This chapter examines two different approaches to the problem.Section 3.3 introduces top-down parsing in the form of recursive-descentparsers and ll(1) parsers.

Свежие статьи
Популярно сейчас