Cooper_Engineering_a_Compiler(Second Edition) (Rice), страница 18

Rice 1869

Описание файла

Файл "Cooper_Engineering_a_Compiler(Second Edition)" внутри архива находится в следующих папках: Rice, Купер и Торчсон - перевод. PDF-файл из архива "Rice", который расположен в категории "разное". Всё это находится в предмете "конструирование компиляторов" из седьмого семестра, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .

Просмотр PDF-файла онлайн

Текст 18 страницы из PDF

At each step, an fa examines the currentcharacter. Its state encodes the left context, that is, the characters that it hasalready processed. Because the fa must make a transition before examiningthe next character, a state such as s0 violates our notion of the behavior of asequential algorithm. An fa that includes states such as s0 that have multipletransitions on a single character is called a nondeterministic finite automaton(nfa). By contrast, an fa with unique character transitions in each state iscalled a deterministic finite automaton (dfa).To make sense of an nfa, we need a set of rules that describe its behavior.Historically, two distinct models have been given for the behavior ofan nfa.1.

Each time the nfa must make a nondeterministic choice, it follows thetransition that leads to an accepting state for the input string, if such atransition exists. This model, using an omniscient nfa, is appealingbecause it maintains (on the surface) the well-defined acceptingmechanism of the DFA. In essence, the nfa guesses the correcttransition at each point.2. Each time the nfa must make a nondeterministic choice, the nfa clonesitself to pursue each possible transition. Thus, for a given inputcharacter, the nfa is in a specific set of states, taken across all of itsclones.

In this model, the nfa pursues all paths concurrently.At any point, we call the specific set of states in which the nfa is activeits configuration. When the nfa reaches a configuration in which it hasexhausted the input and one or more of the clones has reached anaccepting state, the nfa accepts the string.In either model, the nfa (S, 6, δ, s0 , S A ) accepts an input string x1 x2 x3 . . . xkif and only if there exists at least one path through the transition diagram thatstarts in s0 and ends in some sk ∈ S A such that the edge labels along the path2.4 From Regular Expression to Scanner 45match the input string.

(Edges labelled with are omitted.) In other words,the ith edge label must be xi . This definition is consistent with either modelof the nfa’s behavior.Equivalence of NFAs and DFAsnfas and dfas are equivalent in their expressive power. Any dfa is a specialcase of an nfa. Thus, an nfa is at least as powerful as a dfa. Any nfacan be simulated by a dfa—a fact established by the subset construction inSection 2.4.3.

The intuition behind this idea is simple; the construction is alittle more complex.Consider the state of an nfa when it has reached some point in the inputstring. Under the second model of nfa behavior, the nfa has some finiteset of operating clones. The number of these configurations can be bounded;for each state, the configuration either includes one or more clones in thatstate or it does not. Thus, an nfa with n states produces at most |6|nconfigurations.To simulate the behavior of the nfa, we need a dfa with a state for eachconfiguration of the nfa. As a result, the dfa may have exponentially morestates than the nfa.

While SDFA , the set of states in the dfa, might be large,it is finite. Furthermore, the dfa still makes one transition per input symbol.Thus, the dfa that simulates the nfa still runs in time proportional to thelength of the input string. The simulation of an nfa on a dfa has a potentialspace problem, but not a time problem.Since nfas and dfas are equivalent, we can construct a dfa for a∗ ab:as0as1bs2It relies on the observation that a∗ ab specifies the same set of words as aa∗ b.2.4.2 Regular Expression to NFA:Thompson’s ConstructionThe first step in moving from an re to an implemented scanner must derivean nfa from the re.

Thompson’s construction accomplishes this goal in astraightforward way. It has a template for building the nfa that correspondsto a single-letter re, and a transformation on nfas that models the effect ofeach basic re operator: concatenation, alternation, and closure. Figure 2.4Powerset of Nthe set of all subsets of N, denoted 2N46 CHAPTER 2 Scannerssiasisjsksl(b) NFA for “b”(a) NFA for “a”absksjbaskbsjsmslsnsl(d) NFA for “a | b”(c) NFA for “ab”spsisiasjsq(e) NFA for “a* ”n FIGURE 2.4 Trivial NFAs for Regular Expression Operators.shows the trivial nfas for the res a and b, as well as the transformationsto form nfas for the res ab, a|b, and a∗ from the nfas for a and b.

Thetransformations apply to arbitrary nfas.The construction begins by building trivial nfas for each character in theinput re. Next, it applies the transformations for alternation, concatenation, and closure to the collection of trivial nfas in the order dictated byprecedence and parentheses. For the re a(b|c)∗ , the construction would firstbuild nfas for a, b, and c.

Because parentheses have highest precedence,it next builds the nfa for the expression enclosed in parentheses, b|c. Closure has higher precedence than concatenation, so it next builds the closure,(b|c)∗ . Finally, it concatenates the nfa for a to the nfa for (b|c)∗ .The nfas derived from Thompson’s construction have several specific properties that simplify an implementation. Each nfa has one start state and oneaccepting state. No transition, other than the initial transition, enters thestart state. No transition leaves the accepting state.

An -transition alwaysconnects two states that were, earlier in the process, the start state and theaccepting state of nfas for some component res. Finally, each state has atmost two entering and two exiting -moves, and at most one entering andone exiting move on a symbol in the alphabet. Together, these propertiessimplify the representation and manipulation of the nfas. For example, theconstruction only needs to deal with a single accepting state, rather thaniterating over a set of accepting states in the nfa.2.4 From Regular Expression to Scanner 47s0as2s1bs4s3cs5(a) NFAs for “a”, “b”, and “c”s2bs3s6s7s4cs5(b) NFA for “b | c”s2s8bs3s6s7s4cs9s5(c) NFA for “( b | c)ⴱ”s2s0as1s8bs3s6s7s4cs9s5(d) NFA for “a(b | c)ⴱ”n FIGURE 2.5 Applying Thompson’s Construction to a(b|c)∗ .Figure 2.5 shows the nfa that Thompson’s construction builds for a(b|c)∗ .It has many more states than the dfa that a human would likely produce,shown at left.

The nfa also contains many -moves that are obviouslyunneeded. Later stages in the construction will eliminate them.2.4.3 NFA to DFA: The Subset ConstructionThompson’s construction produces an nfa to recognize the language specified by an re. Because dfa execution is much easier to simulate than nfaexecution, the next step in the cycle of constructions converts the nfa builtb,cs0as148 CHAPTER 2 ScannersREPRESENTING THE PRECEDENCE OF OPERATORSThompson’s construction must apply its three transformations in an orderthat is consistent with the precedence of the operators in the regularexpression. To represent that order, an implementation of Thompson’sconstruction can build a tree that represents the regular expression and itsinternal precedence.

The RE a(b|c)∗ produces the following tree:+a*|bcwhere + represents concatenation, | represents alternation, and * represents closure. The parentheses are folded into the structure of the treeand, thus, have no explicit representation.The construction applies the individual transformations in a postorder walkover the tree. Since transformations correspond to operations, the postorder walk builds the following sequence of NFAs: a, b, c, b|c, (b|c)∗ , and,finally, a(b|c)∗ . Chapters 3 and 4 show how to build expression Thompson’s construction into a dfa that recognizes the same language.The resulting dfas have a simple execution model and several efficientimplementations. The algorithm that constructs a dfa from an nfa is calledthe subset construction.The subset construction takes as input an nfa, (N , 6, δ N , n0 , N A ). It producesa dfa, (D, 6, δ D , d0 , D A ).

The nfa and the dfa use the same alphabet, 6.The dfa’s start state, d0 , and its accepting states, D A , will emerge from theconstruction. The complex part of the construction is the derivation of theset of dfa states D from the nfa states N , and the derivation of the dfatransition function δ D .Valid configurationconfiguration of an NFA that can bereached by some input stringThe algorithm, shown in Figure 2.6, constructs a set Q whose elements, qiare each a subset of N , that is, each qi ∈ 2 N .

When the algorithm halts, eachqi ∈ Q corresponds to a state, di ∈ D, in the dfa. The construction builds theelements of Q by following the transitions that the nfa can make on a giveninput. Thus, each qi represents a valid configuration of the nfa.The algorithm begins with an initial set, q0 , that contains n0 and any statesin the nfa that can be reached from n0 along paths that contain only2.4 From Regular Expression to Scanner 49q0 ← -closure({n0 });Q ← q0 ;WorkList ← {q0 };while (WorkList 6= ∅ ) doremove q from WorkList;for each character c ∈ 6 dot ← -closure(Delta(q, c));T[q, c] ← t;if t ∈/ Q thenadd t to Q and to WorkList;end;end;n FIGURE 2.6 The Subset Construction.-transitions.

Свежие статьи
Популярно сейчас