Cooper_Engineering_a_Compiler(Second Edition) (Rice), страница 17
Файл "Cooper_Engineering_a_Compiler(Second Edition)" внутри архива находится в следующих папках: Rice, Купер и Торчсон - перевод. PDF-файл из архива "Rice", который расположен в категории "разное". Всё это находится в предмете "конструирование компиляторов" из седьмого семестра, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст 17 страницы из PDF
Thus, if wecan control the growth in the number of states, we might prefer this version of the re because it is clear and obvious. However, when our processorsuddenly has 256 or 384 registers, enumeration may become tedious, too.2.3.3 Closure Properties of REsRegular expressions and the languages that they generate have been the subject of extensive study.
They have many interesting and useful properties.Some of these properties play a critical role in the constructions that buildrecognizers from res.Regular languagesAny language that can be specified by a regularexpression is called a regular language.40 CHAPTER 2 ScannersPROGRAMMING LANGUAGES VERSUS NATURAL LANGUAGESLexical analysis highlights one of the subtle ways in which programminglanguages differ from natural languages, such as English or Chinese. Innatural languages, the relationship between a word’s representation—itsspelling or its pictogram—and its meaning is not obvious.
In English, are isa verb while art is a noun, even though they differ only in the final character.Furthermore, not all combinations of characters are legitimate words. Forexample, arz differs minimally from are and art, but does not occur as aword in normal English usage.A scanner for English could use FA-based techniques to recognize potentialwords, since all English words are drawn from a restricted alphabet.
Afterthat, however, it must look up the prospective word in a dictionary todetermine if it is, in fact, a word. If the word has a unique part of speech,dictionary lookup will also resolve that issue. However, many English wordscan be classified with several parts of speech. Examples include buoy andstress; both can be either a noun or a verb. For these words, the part ofspeech depends on the surrounding context. In some cases, understandingthe grammatical context suffices to classify the word. In other cases, itrequires an understanding of meaning, for both the word and its context.In contrast, the words in a programming language are almost alwaysspecified lexically.
Thus, any string in [1. . . 9][0. . . 9]∗ is a positive integer.The RE [a. . . z]([a. . . z]|[0. . . 9])∗ defines a subset of the Algol identifiers;arz, are and art are all identifiers, with no lookup needed to establish thefact. To be sure, some identifiers may be reserved as keywords. However,these exceptions can be specified lexically, as well.
No context is required.This property results from a deliberate decision in programming language design. The choice to make spelling imply a unique part of speechsimplifies scanning, simplifies parsing, and, apparently, gives up little inthe expressiveness of the language. Some languages have allowed wordswith dual parts of speech—for example, PL/I has no reserved keywords.The fact that more recent languages abandoned the idea suggests thatthe complications outweighed the extra linguistic flexibility.Regular expressions are closed under many operations—that is, if we applythe operation to an re or a collection of res, the result is an re.
Obviousexamples are concatenation, union, and closure. The concatenation of twores x and y is just xy. Their union is x | y. The Kleene closure of x is just x∗ .From the definition of an re, all of these expressions are also res.These closure properties play a critical role in the use of res to build scanners. Assume that we have an re for each syntactic category in the sourcelanguage, a0 , a1 , a2 , .
. . , an . Then, to construct an re for all the valid wordsin the language, we can join them with alternation as a0 | a1 | a2 | . . . | an .Since res are closed under union, the result is an re. Anything that we can2.3 Regular Expressions 41do to an re for a single syntactic category will be equally applicable to there for all the valid words in the language.Closure under union implies that any finite language is a regular language.We can construct an re for any finite collection of words by listing themin a large alternation.
Because the set of res is closed under union, thatalternation is an re and the corresponding language is regular.Closure under concatenation allows us to build complex res from simpler ones by concatenating them. This property seems both obvious andunimportant. However, it lets us piece together res in systematic ways. Closure ensures that ab is an re as long as both a and b are res. Thus, anytechniques that can be applied to either a or b can be applied to ab; thisincludes constructions that automatically generate a recognizer from res.Regular expressions are also closed under both Kleene closure and thefinite closures. This property lets us specify particular kinds of large, or eveninfinite, sets with finite patterns. Kleene closure lets us specify infinite setswith concise finite patterns; examples include the integers and unboundedlength identifiers.
Finite closures let us specify large but finite sets with equalease.The next section shows a sequence of constructions that build an fa to recognize the language specified by an re. Section 2.6 shows an algorithmthat goes the other way, from an fa to an re. Together, these constructionsestablish the equivalence of res and fas. The fact that res are closed underalternation, concatenation, and closure is critical to these constructions.The equivalence between res and fas also suggests other closure properties.For example, given a complete fa, we can construct an fa that recognizes allwords w that are not in L(fa), called the complement of L(fa).
To build thisnew fa for the complement, we can swap the designation of accepting andnonaccepting states in the original fa. This result suggests that res are closedunder complement. Indeed, many systems that use res include a complementoperator, such as the ˆ operator in lex.SECTION REVIEWRegular expressions are a concise and powerful notation for specifyingthe microsyntax of programming languages.
REs build on three basicoperations over finite alphabets: alternation, concatenation, and Kleeneclosure. Other convenient operators, such as finite closures, positiveclosure, and complement, derive from the three basic operations. Regularexpressions and finite automata are related; any RE can be realized in anFA and the language accepted by any FA can be described with RE. Thenext section formalizes that relationship.Complete FAan FA that explicitly includes all error transitions42 CHAPTER 2 ScannersReview Questions1.
Recall the RE for a six-character identifier, written using a finite closure.([A. . . Z] | [a. . . z]) ([A. . . Z] | [a. . . z] | [0. . . 9])5Rewrite it in terms of the three basic RE operations: alternation,concatenation, and closure.2. In PL/I, the programmer can insert a quotation mark into a string bywriting two quotation marks in a row.
Thus, the stringThe quotation mark, ", should be typeset in italicswould be written in a PL/I program as"The quotation mark, "", should be typeset in italics."Design an RE and an FA to recognize PL/I strings. Assume that stringsbegin and end with quotation marks and contain only symbols drawnfrom an alphabet, designated as 6. Quotation marks are the onlyspecial case.2.4 FROM REGULAR EXPRESSION TO SCANNERThe goal of our work with finite automata is to automate the derivationof executable scanners from a collection of res.
This section develops theconstructions that transform an re into an fa that is suitable for direct implementation and an algorithm that derives an re for the language accepted byan fa. Figure 2.3 shows the relationship between all of these constructions.To present these constructions, we must distinguish between deterministicfas, or dfas, and nondeterministic fas, or nfas, in Section 2.4.1. Next,Kleene’s ConstructionCode fora scannerREDFA MinimizationThompson’sConstructionDFASubsetConstructionNFAn FIGURE 2.3 The Cycle of Constructions.2.4 From Regular Expression to Scanner 43we present the construction of a deterministic fa from an re in three steps.Thompson’s construction, in Section 2.4.2, derives an nfa from an re. Thesubset construction, in Section 2.4.3, builds a dfa that simulates an nfa.Hopcroft’s algorithm, in Section 2.4.4, minimizes a dfa.
To establish theequivalence of res and dfas, we also need to show that any dfa is equivalent to an re; Kleene’s construction derives an re from a dfa. Because itdoes not figure directly into scanner construction, we defer that algorithmuntil Section 126.96.36.199.4.1 Nondeterministic Finite AutomataRecall from the definition of an re that we designated the empty string, , asan re. None of the fas that we built by hand included , but some of the resdid.
What role does play in an fa? We can use transitions on to combinefas and form fas for more complex res. For example, assume that we havefas for the res m and n, called fam and fan , respectively.s0mns0s1s1We can build an fa for mn by adding a transition on from the acceptingstate of fam to the initial state of fan , renumbering the states, and using fan ’saccepting state as the accepting state for the new fa.s0ms1s2ns3With an -transition, the definition of acceptance must change slightly toallow one or more -transitions between any two characters in the inputstring. For example, in s1 , the fa takes the transition s1 → s2 without consuming any input character.
This is a minor change, but it seems intuitive.Inspection shows that we can combine s1 and s2 to eliminate the -transition.s0mns1s2Merging two fas with an -transition can complicate our model of how faswork. Consider the fas for the languages a∗ and ab.as0s0as1bs2-transitiona transition on the empty string, , that doesnot advance the input44 CHAPTER 2 ScannersWe can combine them with an -transition to form an fa for a∗ ab.as0s1as2bs3The transition, in effect, gives the fa two distinct transitions out of s0aon the letter a. It can take the transition s0 → s0 , or the two transitionsas0 → s1 and s1 → s2 .
Which transition is correct? Consider the strings aabaand ab. The dfa should accept both strings. For aab, it should move s0 → s0 ,abas0 → s1 , s1 → s2 , and s2 → s3 . For ab, it should move s0 → s1 , s1 → s2 , andbs2 → s3 .Nondeterministic FAan FA that allows transitions on the empty string,, and states that have multiple transitions onthe same characterDeterministic FAA DFA is an FA where the transition function issingle-valued. DFAs do not allow -transitions.Configuration of an NFAthe set of concurrently active states of an NFAAs these two strings show, the correct transition out of s0 on a depends onthe characters that follow the a.