K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (798440), страница 19
Текст из файла (страница 19)
The number of these configurations can be bounded;for each state, the configuration either includes one or more clones in thatstate or it does not. Thus, an nfa with n states produces at most |6|nconfigurations.To simulate the behavior of the nfa, we need a dfa with a state for eachconfiguration of the nfa.
As a result, the dfa may have exponentially morestates than the nfa. While SDFA , the set of states in the dfa, might be large,it is finite. Furthermore, the dfa still makes one transition per input symbol.Thus, the dfa that simulates the nfa still runs in time proportional to thelength of the input string. The simulation of an nfa on a dfa has a potentialspace problem, but not a time problem.Since nfas and dfas are equivalent, we can construct a dfa for a∗ ab:as0as1bs2It relies on the observation that a∗ ab specifies the same set of words as aa∗ b.2.4.2 Regular Expression to NFA:Thompson’s ConstructionThe first step in moving from an re to an implemented scanner must derivean nfa from the re. Thompson’s construction accomplishes this goal in astraightforward way. It has a template for building the nfa that correspondsto a single-letter re, and a transformation on nfas that models the effect ofeach basic re operator: concatenation, alternation, and closure.
Figure 2.4Powerset of Nthe set of all subsets of N, denoted 2N46 CHAPTER 2 Scannerssiasisjsksl(b) NFA for “b”(a) NFA for “a”absksjbaskbsjsmslsnsl(d) NFA for “a | b”(c) NFA for “ab”spsisiasjsq(e) NFA for “a* ”n FIGURE 2.4 Trivial NFAs for Regular Expression Operators.shows the trivial nfas for the res a and b, as well as the transformationsto form nfas for the res ab, a|b, and a∗ from the nfas for a and b.
Thetransformations apply to arbitrary nfas.The construction begins by building trivial nfas for each character in theinput re. Next, it applies the transformations for alternation, concatenation, and closure to the collection of trivial nfas in the order dictated byprecedence and parentheses. For the re a(b|c)∗ , the construction would firstbuild nfas for a, b, and c. Because parentheses have highest precedence,it next builds the nfa for the expression enclosed in parentheses, b|c. Closure has higher precedence than concatenation, so it next builds the closure,(b|c)∗ . Finally, it concatenates the nfa for a to the nfa for (b|c)∗ .The nfas derived from Thompson’s construction have several specific properties that simplify an implementation.
Each nfa has one start state and oneaccepting state. No transition, other than the initial transition, enters thestart state. No transition leaves the accepting state. An -transition alwaysconnects two states that were, earlier in the process, the start state and theaccepting state of nfas for some component res. Finally, each state has atmost two entering and two exiting -moves, and at most one entering andone exiting move on a symbol in the alphabet. Together, these propertiessimplify the representation and manipulation of the nfas. For example, theconstruction only needs to deal with a single accepting state, rather thaniterating over a set of accepting states in the nfa.2.4 From Regular Expression to Scanner 47s0as2s1bs4s3cs5(a) NFAs for “a”, “b”, and “c”s2bs3s6s7s4cs5(b) NFA for “b | c”s2s8bs3s6s7s4cs9s5(c) NFA for “( b | c)ⴱ”s2s0as1s8bs3s6s7s4cs9s5(d) NFA for “a(b | c)ⴱ”n FIGURE 2.5 Applying Thompson’s Construction to a(b|c)∗ .Figure 2.5 shows the nfa that Thompson’s construction builds for a(b|c)∗ .It has many more states than the dfa that a human would likely produce,shown at left.
The nfa also contains many -moves that are obviouslyunneeded. Later stages in the construction will eliminate them.2.4.3 NFA to DFA: The Subset ConstructionThompson’s construction produces an nfa to recognize the language specified by an re. Because dfa execution is much easier to simulate than nfaexecution, the next step in the cycle of constructions converts the nfa builtb,cs0as148 CHAPTER 2 ScannersREPRESENTING THE PRECEDENCE OF OPERATORSThompson’s construction must apply its three transformations in an orderthat is consistent with the precedence of the operators in the regularexpression. To represent that order, an implementation of Thompson’sconstruction can build a tree that represents the regular expression and itsinternal precedence. The RE a(b|c)∗ produces the following tree:+a*|bcwhere + represents concatenation, | represents alternation, and * represents closure. The parentheses are folded into the structure of the treeand, thus, have no explicit representation.The construction applies the individual transformations in a postorder walkover the tree.
Since transformations correspond to operations, the postorder walk builds the following sequence of NFAs: a, b, c, b|c, (b|c)∗ , and,finally, a(b|c)∗ . Chapters 3 and 4 show how to build expression trees.by Thompson’s construction into a dfa that recognizes the same language.The resulting dfas have a simple execution model and several efficientimplementations. The algorithm that constructs a dfa from an nfa is calledthe subset construction.The subset construction takes as input an nfa, (N , 6, δ N , n0 , N A ). It producesa dfa, (D, 6, δ D , d0 , D A ).
The nfa and the dfa use the same alphabet, 6.The dfa’s start state, d0 , and its accepting states, D A , will emerge from theconstruction. The complex part of the construction is the derivation of theset of dfa states D from the nfa states N , and the derivation of the dfatransition function δ D .Valid configurationconfiguration of an NFA that can bereached by some input stringThe algorithm, shown in Figure 2.6, constructs a set Q whose elements, qiare each a subset of N , that is, each qi ∈ 2 N . When the algorithm halts, eachqi ∈ Q corresponds to a state, di ∈ D, in the dfa.
The construction builds theelements of Q by following the transitions that the nfa can make on a giveninput. Thus, each qi represents a valid configuration of the nfa.The algorithm begins with an initial set, q0 , that contains n0 and any statesin the nfa that can be reached from n0 along paths that contain only2.4 From Regular Expression to Scanner 49q0 ← -closure({n0 });Q ← q0 ;WorkList ← {q0 };while (WorkList 6= ∅ ) doremove q from WorkList;for each character c ∈ 6 dot ← -closure(Delta(q, c));T[q, c] ← t;if t ∈/ Q thenadd t to Q and to WorkList;end;end;n FIGURE 2.6 The Subset Construction.-transitions. Those states are equivalent since they can be reached withoutconsuming input.To construct q0 from n0 , the algorithm computes -closure(n0 ).
It takes,as input, a set S of nfa states. It returns a set of nfa states constructedfrom S as follows: -closure examines each state si ∈ S and adds to S anystate reachable by following one or more -transitions from si . If S is theset of states reachable from n0 by following paths labelled with abc, then-closure(S) is the set of states reachable from n0 by following pathslabelled abc ∗ . Initially, Q has only one member, q0 and the WorkListcontains q0 .The algorithm proceeds by removing a set q from the worklist.
Each q represents a valid configuration of the original nfa. The algorithm constructs,for each character c in the alphabet 6, the configuration that the nfa wouldreach if it read c while in configuration q. This computation uses a functionDelta(q, c) that applies the nfa’s transition function to each element of q.It returns ∪s∈qi δ N (s,c).The while loop repeatedly removes a configuration q from the worklist anduses Delta to compute its potential transitions. It augments this computedconfiguration with any states reachable by following -transitions, and addsany new configurations generated in this way to both Q and the worklist.When it discovers a new configuration t reachable from q on character c, thealgorithm records that transition in the table T.
The inner loop, which iteratesover the alphabet for each configuration, performs an exhaustive search.Notice that Q grows monotonically. The while loop adds sets to Q but neverremoves them. Since the number of configurations of the nfa is bounded and50 CHAPTER 2 Scannerseach configuration only appears once on the worklist, the while loop musthalt. When it halts, Q contains all of the valid configurations of the nfa andT holds all of the transitions between them.Q can become large—as large as |2 N | distinct states. The amount of nonde-terminism found in the nfa determines how much state expansion occurs.Recall, however, that the result is a dfa that makes exactly one transition perinput character, independent of the number of states in the dfa.
Thus, anyexpansion introduced by the subset construction does not affect the runningtime of the dfa.From Q to DWhen the subset construction halts, it has constructed a model of the desireddfa, one that simulates the original nfa. Building the dfa from Q and T isstraightforward. Each qi ∈ Q needs a state di ∈ D to represent it. If qi contains an accepting state of the nfa, then di is an accepting state of the dfa.We can construct the transition function, δ D , directly from T by observingthe mapping from qi to di .
Finally, the state constructed from q0 becomesd0 , the initial state of the dfa.ExampleConsider the nfa built for a(b|c)∗ in Section 2.4.2 and shown in Figure 2.7a,with its states renumbered. The table in Figure 2.7b sketches the steps thatthe subset construction follows. The first column shows the name of theset in Q being processed in a given iteration of the while loop. The secondcolumn shows the name of the corresponding state in the new dfa.
The thirdcolumn shows the set of nfa states contained in the current set from Q. Thefinal three columns show results of computing the -closure of Delta onthe state for each character in 6.The algorithm takes the following steps:1. The initialization sets q0 to -closure({n0 }), which is just n0 . The firstiteration computes -closure(Delta(q0 ,a)), which contains six nfastates, and -closure(Delta(q0 ,b)) and -closure(Delta(q0 ,c)),which are empty.2. The second iteration of the while loop examines q1 . It produces twoconfigurations and names them q2 and q3 .3. The third iteration of the while loop examines q2 .
It constructs twoconfigurations, which are identical to q2 and q3 .4. The fourth iteration of the while loop examines q3 . Like the thirditeration, it reconstructs q2 and q3 .Figure 2.7c shows the resulting dfa; the states correspond to the dfa statesfrom the table and the transitions are given by the Delta operations that2.4 From Regular Expression to Scanner 51bn4n0an2n1n5n3n8cn6n9n7(a) NFA for “a(b | c)* ” (With States Renumbered)-closure(Delta(q,*))SetNameDFAStatesNFAStatesaq0d0q1d1q2d2q3d3n0n1 , n2 , n3 ,n4 , n6 , n9n5 , n8 , n9 ,n3 , n4 , n6n7 , n8 , n9 ,n3 , n4 , n6n1 , n2 , n3 ,n4 , n6 , n9bc– none –– none –n 7 , n8 , n9 ,n3 , n4 , n6– none –n5 , n8 , n9 ,n3 , n4 , n6– none –q2q3– none –q2q3(b) Iterations of the Subset Constructiond0ad2bcd1cbbd3c(a) Resulting DFAn FIGURE 2.7 Applying the Subset Construction to the NFA from Figure 2.5.generate those states. Since the sets q1 , q2 and q3 all contain n9 (theaccepting state of the nfa), all three become accepting states in the dfa.Fixed-Point ComputationsThe subset construction is an example of a fixed-point computation, a particular style of computation that arises regularly in computer science.