Cooper_Engineering_a_Compiler(Second Edition) (1157546), страница 21

Файл №1157546 Cooper_Engineering_a_Compiler(Second Edition) (Rice) 21 страницаCooper_Engineering_a_Compiler(Second Edition) (1157546) страница 212019-09-182019-09-18СтудИзба

Rice

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 21)

Since mostreal programs contain more than one word, we need to transform either thelanguage or the recognizer.At the language level, we can insist that each word end with some easily recognizable delimiter, like a blank or a tab. This idea is deceptivelyattractive. Taken literally, it requires delimiters surrounding all operators, as+, -, (, ), and the comma.At the recognizer level, we can change the implementation of the dfa and itsnotion of acceptance.

To find the longest word that matches one of the res,the dfa should run until it reaches the point where the current state, s, has nooutgoing transition on the next character. At that point, the implementationmust decide which re it has matched. Two cases arise; the first is simple. Ifs is an accepting state, then the dfa has found a word in the language andshould report the word and its syntactic category.If s is not an accepting state, matters are more complex.

Two cases occur. Ifthe dfa passed through one or more accepting states on its way to s, the recognizer should back up to the most recent such state. This strategy matchesthe longest valid prefix in the input string. If it never reached an acceptingstate, then no prefix of the input string is a valid word and the recognizershould report an error.

The scanners in Section 2.5.1 implement both thesenotions.As a final complication, an accepting state in the dfa may represent severalaccepting states in the original nfa. For example, if the lexical specification includes res for keywords as well as an re for identifiers, then akeyword such as new might match two res. The recognizer must decidewhich syntactic category to return: identifier or the singleton category forthe keyword new.Most scanner-generator tools allow the compiler writer to specify a priorityamong patterns. When the recognizer matches multiple patterns, it returnsthe syntactic category of the highest-priority pattern. This mechanismresolves the problem in a simple way. The lex scanner generator, distributedwith many Unix systems, assigns priorities based on position in the list ofres. The first re has highest priority, while the last re has lowest priority.As a practical matter, the compiler writer must also specify res for partsof the input stream that do not form words in the program text.

In mostprogramming languages, blank space is ignored, but every program containsit. To handle blank space, the compiler writer typically includes an re thatmatches blanks, tabs, and end-of-line characters; the action on accepting2.5 Implementing Scanners 59blank space is to invoke the scanner, recursively, and return its result. Ifcomments are discarded, they are handled in a similar fashion.SECTION REVIEWGiven a regular expression, we can derive a minimal DFA to recognizethe language specified by the RE using the following steps: (1) applyThompson’s construction to build an NFA for the RE; (2) use the subsetconstruction to derive a DFA that simulates the behavior of the RE; and(3) use Hopcroft’s algorithm to identify equivalent states in the DFA andconstruct a minimal DFA.

This trio of constructions produces an efficientrecognizer for any language that can be specified with an RE.Both the subset construction and the DFA minimization algorithm arefixed-point computations. They are characterized by repeated application of a monotone function to some set; the properties of the domainplay an important role in reasoning about the termination and complexity of these algorithms. We will see more fixed-point computations inlater chapters.Review Questions1.

Consider the RE who | what | where. Use Thompson’s construction tobuild an NFA from the RE. Use the subset construction to build a DFAfrom the NFA. Minimize the DFA.2. Minimize the following DFA:ts1hs2es3rs4s6es7rs8es9es5s0h2.5 IMPLEMENTING SCANNERSScanner construction is a problem where the theory of formal languages hasproduced tools that can automate implementation. For most languages, thecompiler writer can produce an acceptably fast scanner directly from a setof regular expressions.

The compiler writer creates an re for each syntacticcategory and gives the res as input to a scanner generator. The generatorconstructs an nfa for each re, joins them with -transitions, creates a corresponding dfa, and minimizes the dfa. At that point, the scanner generatormust convert the dfa into executable code.60 CHAPTER 2 ScannersLexicalScannerPatterns GeneratorTablesFAInterpretern FIGURE 2.13 Generating a Table-Driven Scanner.This section discusses three implementation strategies for converting a dfainto executable code: a table-driven scanner, a direct-coded scanner, and ahand-coded scanner. All of these scanners operate in the same manner, bysimulating the dfa.

They repeatedly read the next character in the input andsimulate the dfa transition caused by that character. This process stops whenthe dfa recognizes a word. As described in the previous section, that occurswhen the current state, s, has no outbound transition on the current inputcharacter.If s is an accepting state, the scanner recognizes the word and returns a lexeme and its syntactic category to the calling procedure. If s is a nonacceptingstate, the scanner must determine whether or not it passed through an accepting state on the way to s. If the scanner did encounter an accepting state, itshould roll back its internal state and its input stream to that point and reportsuccess. If it did not, it should report the failure.These three implementation strategies, table driven, direct coded, and handcoded, differ in the details of their runtime costs.

However, they all havethe same asymptotic complexity—constant cost per character, plus the costof roll back. The differences in the efficiency of well-implemented scannerschange the constant costs per character but not the asymptotic complexity ofscanning.The next three subsections discuss implementation differences betweentable-driven, direct-coded, and hand-coded scanners. The strategies differin how they model the dfa’s transition structure and how they simulateits operation. Those differences, in turn, produce different runtime costs.The final subsection examines two different strategies for handling reservedkeywords.2.5.1 Table-Driven ScannersThe table-driven approach uses a skeleton scanner for control and a setof generated tables that encode language-specific knowledge.

As shown inFigure 2.13, the compiler writer provides a set of lexical patterns, specified2.5 Implementing Scanners 61r0, 1, 2, . . ., 9EOFOtherRegisterDigitOtherOtherNextWord()state ← s0 ;lexeme ← ‘‘ ’’;clear stack;The Classifier Table, CharCatpush(bad);while (state 6= se ) doNextChar(char);lexeme ← lexeme + char;if state ∈ SAthen clear stack;push(state);cat ← CharCat[char];state ← δ[state,cat];end;while(state ∈/ SA andstate 6= bad) dostate ← pop();truncate lexeme;RegisterDigitOthers1seseseses2s2seseseseses0s1s2seThe Transition Table, δs0s1s2seinvalid invalid register invalidThe Token Type Table, TypeRollBack();end;if state ∈ SAthen return Type[state];else return invalid;s0rs10…90…9s2The Underlying DFAn FIGURE 2.14 A Table-Driven Scanner for Register Names.as regular expressions.

The scanner generator then produces tables that drivethe skeleton scanner.Figure 2.14 shows a table-driven scanner for the re r [0. . . 9]+ , which wasour first attempt at an re for iloc register names. The left side of thefigure shows the skeleton scanner, while the right side shows the tables forr [0. . . 9]+ and the underlying dfa. Notice the similarity between the codehere and the recognizer shown in Figure 2.2 on page 32.The skeleton scanner divides into four sections: initializations, a scanningloop that models the dfa’s behavior, a roll back loop in case the dfa overshoots the end of the token, and a final section that interprets and reports theresults. The scanning loop repeats the two basic actions of a scanner: reada character and simulate the dfa’s action.

It halts when the dfa enters the62 CHAPTER 2 Scannerserror state, se . Two tables, CharCat and δ, encode all knowledge about thedfa. The roll back loop uses a stack of states to revert the scanner to its mostrecent accepting state.The skeleton scanner uses the variable state to hold the current state ofthe simulated dfa. It updates state using a two-step, table-lookup process.First, it classifies char into one of a small set of categories using the CharCat table.

The scanner for r [0. . . 9]+ has three categories: Register, Digit, orOther. Next, it uses the current state and the character category as indicesinto the transition table, δ.For small examples, such as r[0 . . . 9]+ , theclassifier table is larger than the completetransition table. In a realistically sized example,that relationship should be reversed.This two-step translation, character to category, then state and category tonew state, lets the scanner use a compressed transition table. The tradeoffbetween direct access into a larger table and indirect access into the compressed table is straightforward.A complete table would eliminate the mapping through CharCat, but would increase the memory footprint of the table.The uncompressed transition table grows as the product of the number ofstates in the dfa and the number of characters in 6; it can grow to the pointwhere it will not stay in cache.With a small, compact character set, such as ascii, CharCat can be represented as a simple table lookup.

The relevant portions of CharCat shouldstay in the cache. In that case, table compression adds one cache referenceper input character. As the character set grows (e.g. Unicode), more compleximplementations of CharCat may be needed. The precise tradeoff betweenthe per-character costs of both compressed and uncompressed tables willdepend on properties of both the language and the computer that runs thescanner.To provide a character-by-character interface to the input stream, the skeleton scanner uses a macro, NextChar, which sets its sole parameter to containthe next character in the input stream.

A corresponding macro, RollBack,moves the input stream back by one character. (Section 2.5.3 looks atNextChar and RollBack.)If the scanner reads too far, state will not contain an accepting state atthe end of the first while loop. In that case, the second while loop uses thestate trace from the stack to roll the state, lexeme, and input stream backto the most recent accepting state.

Характеристики

Тип файла

PDF-файл

Размер

8,27 Mb

Материал

Rice

Тип материала

Другое

Предмет

Конструирование компиляторов

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов учебной работы

rice.rar

Rice

COMP 512 Advanced Compiler Construction_files

2007Shield.jpg

Другое

08Taxonomy.pptx

978-0-262-31709-2-ch014.pdf

peephole.pptx

scheduling.pptx

Купер и Торчсон - перевод

Рисунки

Рис1-1.jpg

12 Instruction Scheduling(Engl).doc

12 Instruction Scheduling(Rusl).doc

12a Instruction Scheduling (Terminology)E.doc

12a Instruction Scheduling (Terminology)R.doc

12b Instruction Scheduling (Side-bars)E.doc

12b Instruction Scheduling (Side-bars)R.doc

Cooper_Engineering_a_Compiler(Second Edition).pdf

COMP 512 Advanced Compiler Construction.htm

Rice University Houston COMP 412- Topics in Compiler Construction.doc

Rice University Houston COMP 512 Advanced Compiler Construction.doc

Прочти меня!!!.txt

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.