K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (798440), страница 18
Текст из файла (страница 18)
Closure ensures that ab is an re as long as both a and b are res. Thus, anytechniques that can be applied to either a or b can be applied to ab; thisincludes constructions that automatically generate a recognizer from res.Regular expressions are also closed under both Kleene closure and thefinite closures. This property lets us specify particular kinds of large, or eveninfinite, sets with finite patterns. Kleene closure lets us specify infinite setswith concise finite patterns; examples include the integers and unboundedlength identifiers.
Finite closures let us specify large but finite sets with equalease.The next section shows a sequence of constructions that build an fa to recognize the language specified by an re. Section 2.6 shows an algorithmthat goes the other way, from an fa to an re. Together, these constructionsestablish the equivalence of res and fas. The fact that res are closed underalternation, concatenation, and closure is critical to these constructions.The equivalence between res and fas also suggests other closure properties.For example, given a complete fa, we can construct an fa that recognizes allwords w that are not in L(fa), called the complement of L(fa).
To build thisnew fa for the complement, we can swap the designation of accepting andnonaccepting states in the original fa. This result suggests that res are closedunder complement. Indeed, many systems that use res include a complementoperator, such as the ˆ operator in lex.SECTION REVIEWRegular expressions are a concise and powerful notation for specifyingthe microsyntax of programming languages.
REs build on three basicoperations over finite alphabets: alternation, concatenation, and Kleeneclosure. Other convenient operators, such as finite closures, positiveclosure, and complement, derive from the three basic operations. Regularexpressions and finite automata are related; any RE can be realized in anFA and the language accepted by any FA can be described with RE. Thenext section formalizes that relationship.Complete FAan FA that explicitly includes all error transitions42 CHAPTER 2 ScannersReview Questions1.
Recall the RE for a six-character identifier, written using a finite closure.([A. . . Z] | [a. . . z]) ([A. . . Z] | [a. . . z] | [0. . . 9])5Rewrite it in terms of the three basic RE operations: alternation,concatenation, and closure.2. In PL/I, the programmer can insert a quotation mark into a string bywriting two quotation marks in a row. Thus, the stringThe quotation mark, ", should be typeset in italicswould be written in a PL/I program as"The quotation mark, "", should be typeset in italics."Design an RE and an FA to recognize PL/I strings.
Assume that stringsbegin and end with quotation marks and contain only symbols drawnfrom an alphabet, designated as 6. Quotation marks are the onlyspecial case.2.4 FROM REGULAR EXPRESSION TO SCANNERThe goal of our work with finite automata is to automate the derivationof executable scanners from a collection of res. This section develops theconstructions that transform an re into an fa that is suitable for direct implementation and an algorithm that derives an re for the language accepted byan fa.
Figure 2.3 shows the relationship between all of these constructions.To present these constructions, we must distinguish between deterministicfas, or dfas, and nondeterministic fas, or nfas, in Section 2.4.1. Next,Kleene’s ConstructionCode fora scannerREDFA MinimizationThompson’sConstructionDFASubsetConstructionNFAn FIGURE 2.3 The Cycle of Constructions.2.4 From Regular Expression to Scanner 43we present the construction of a deterministic fa from an re in three steps.Thompson’s construction, in Section 2.4.2, derives an nfa from an re. Thesubset construction, in Section 2.4.3, builds a dfa that simulates an nfa.Hopcroft’s algorithm, in Section 2.4.4, minimizes a dfa.
To establish theequivalence of res and dfas, we also need to show that any dfa is equivalent to an re; Kleene’s construction derives an re from a dfa. Because itdoes not figure directly into scanner construction, we defer that algorithmuntil Section 2.6.1.2.4.1 Nondeterministic Finite AutomataRecall from the definition of an re that we designated the empty string, , asan re. None of the fas that we built by hand included , but some of the resdid. What role does play in an fa? We can use transitions on to combinefas and form fas for more complex res. For example, assume that we havefas for the res m and n, called fam and fan , respectively.s0mns0s1s1We can build an fa for mn by adding a transition on from the acceptingstate of fam to the initial state of fan , renumbering the states, and using fan ’saccepting state as the accepting state for the new fa.s0ms1s2ns3With an -transition, the definition of acceptance must change slightly toallow one or more -transitions between any two characters in the inputstring.
For example, in s1 , the fa takes the transition s1 → s2 without consuming any input character. This is a minor change, but it seems intuitive.Inspection shows that we can combine s1 and s2 to eliminate the -transition.s0mns1s2Merging two fas with an -transition can complicate our model of how faswork. Consider the fas for the languages a∗ and ab.as0s0as1bs2-transitiona transition on the empty string, , that doesnot advance the input44 CHAPTER 2 ScannersWe can combine them with an -transition to form an fa for a∗ ab.as0s1as2bs3The transition, in effect, gives the fa two distinct transitions out of s0aon the letter a.
It can take the transition s0 → s0 , or the two transitionsas0 → s1 and s1 → s2 . Which transition is correct? Consider the strings aabaand ab. The dfa should accept both strings. For aab, it should move s0 → s0 ,abas0 → s1 , s1 → s2 , and s2 → s3 . For ab, it should move s0 → s1 , s1 → s2 , andbs2 → s3 .Nondeterministic FAan FA that allows transitions on the empty string,, and states that have multiple transitions onthe same characterDeterministic FAA DFA is an FA where the transition function issingle-valued.
DFAs do not allow -transitions.Configuration of an NFAthe set of concurrently active states of an NFAAs these two strings show, the correct transition out of s0 on a depends onthe characters that follow the a. At each step, an fa examines the currentcharacter. Its state encodes the left context, that is, the characters that it hasalready processed. Because the fa must make a transition before examiningthe next character, a state such as s0 violates our notion of the behavior of asequential algorithm.
An fa that includes states such as s0 that have multipletransitions on a single character is called a nondeterministic finite automaton(nfa). By contrast, an fa with unique character transitions in each state iscalled a deterministic finite automaton (dfa).To make sense of an nfa, we need a set of rules that describe its behavior.Historically, two distinct models have been given for the behavior ofan nfa.1.
Each time the nfa must make a nondeterministic choice, it follows thetransition that leads to an accepting state for the input string, if such atransition exists. This model, using an omniscient nfa, is appealingbecause it maintains (on the surface) the well-defined acceptingmechanism of the DFA. In essence, the nfa guesses the correcttransition at each point.2. Each time the nfa must make a nondeterministic choice, the nfa clonesitself to pursue each possible transition.
Thus, for a given inputcharacter, the nfa is in a specific set of states, taken across all of itsclones. In this model, the nfa pursues all paths concurrently.At any point, we call the specific set of states in which the nfa is activeits configuration. When the nfa reaches a configuration in which it hasexhausted the input and one or more of the clones has reached anaccepting state, the nfa accepts the string.In either model, the nfa (S, 6, δ, s0 , S A ) accepts an input string x1 x2 x3 . . . xkif and only if there exists at least one path through the transition diagram thatstarts in s0 and ends in some sk ∈ S A such that the edge labels along the path2.4 From Regular Expression to Scanner 45match the input string. (Edges labelled with are omitted.) In other words,the ith edge label must be xi . This definition is consistent with either modelof the nfa’s behavior.Equivalence of NFAs and DFAsnfas and dfas are equivalent in their expressive power.
Any dfa is a specialcase of an nfa. Thus, an nfa is at least as powerful as a dfa. Any nfacan be simulated by a dfa—a fact established by the subset construction inSection 2.4.3. The intuition behind this idea is simple; the construction is alittle more complex.Consider the state of an nfa when it has reached some point in the inputstring. Under the second model of nfa behavior, the nfa has some finiteset of operating clones.