K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (798440), страница 20
Текст из файла (страница 20)
These52 CHAPTER 2 ScannersMonotone functiona function f on domain D is monotone if,∀ x, y∈ D, x ≤ y⇒f (x) ≤ f (y)computations are characterized by the iterated application of a monotonefunction to some collection of sets drawn from a domain whose structure isknown. These computations terminate when they reach a state where furtheriteration produces the same answer—a “fixed point” in the space of successive iterates. Fixed-point computations play an important and recurring rolein compiler construction.Termination arguments for fixed-point algorithms usually depend on knownNproperties of the domain. For the subset construction, the domain D is 22 ,NNsince Q = {q0 , q1 , q2 , . . . , qk } where each qi ∈ 2 .
Since N is finite, 2 andN22 are also finite. The while loop adds elements to Q; it cannot removean element from Q. We can view the while loop as a monotone increasingfunction f, which means that for a set x, f (x) ≥ x. (The comparison operator≥ is ⊇.) Since Q can have at most |2 N | distinct elements, the while loop caniterate at most |2 N | times. It may, of course, reach a fixed point and halt morequickly than that.Computing -closure OfflineAn implementation of the subset construction could compute -closure()by following paths in the transition graph of the nfa as needed.
Figure 2.8shows another approach: an offline algorithm that computes -closure( {n})for each state n in the transition graph. The algorithm is another example ofa fixed-point computation.For the purposes of this algorithm, consider the transition diagram of thenfa as a graph, with nodes and edges. The algorithm begins by creating aset E for each node in the graph. For a node n, E(n) will hold the currentfor each state n ∈ N doE(n) ← {n};end;WorkList ← N;while (WorkList 6= ∅) doremove n from WorkList;St ← {n} ∪E( p);n→p ∈ δNif t 6= E(n)then begin;E(n) ← t;WorkList ← WorkList ∪ {m | m →n ∈ δN };end;end;n FIGURE 2.8 An Offline Algorithm for -closure.2.4 From Regular Expression to Scanner 53approximation to -closure(n). Initially, the algorithm sets E(n) to { n },for each node n, and places each node on the worklist.Each iteration of the while loop removes a node n from the worklist, findsall of the -transitions that leave n, and adds their targets to E(n). If thatcomputation changes E(n), it places n’s predecessors along -transitions onthe worklist.
(If n is in the -closure of its predecessor, adding nodes to E(n)must also add them to the predecessor’s set.) This process halts when theworklist becomes empty.Using a bit-vector set for the worklist can ensurethat the algorithm does not have duplicatecopies of a node’s name on the worklist.See Appendix B.2.The termination argument for this algorithm is more complex than thatfor the algorithm in Figure 2.6.
The algorithm halts when the worklist isempty. Initially, the worklist contains every node in the graph. Each iterationremoves a node from the worklist; it may also add one or more nodes to theworklist.The algorithm only adds a node to the worklist if the E set of its successorchanges. The E(n) sets increase monotonically. For a node x, its successor yalong an -transition can place x on the worklist at most |E( y)| ≤ |N | times,in the worst case. If x has multiple successors yi along -transitions, each ofthem can place x on the worklist |E( yi )| ≤ |N | times.
Taken over the entiregraph, the worst case behavior would place nodes on the worklist k · |N |times, where k is the number of -transitions in the graph. Thus, the worklisteventually becomes empty and the computation halts.2.4.4 DFA to Minimal DFA: Hopcroft’s AlgorithmAs a final refinement to the re→dfa conversion, we can add an algorithmto minimize the number of states in the dfa. The dfa that emerges fromthe subset construction can have a large set of states.
While this does notincrease the time needed to scan a string, it does increase the size of therecognizer in memory. On modern computers, the speed of memory accessesoften governs the speed of computation. A smaller recognizer may fit betterinto the processor’s cache memory.To minimize the number of states in a dfa, (D, 6, δ, d0 , D A ), we need atechnique to detect when two states are equivalent—that is, when they produce the same behavior on any input string. The algorithm in Figure 2.9finds equivalence classes of dfa states based on their behavior. From thoseequivalence classes, we can construct a minimal dfa.The algorithm constructs a set partition, P = { p1 , p2 , p3 , .
. . pm }, of the dfastates. The particular partition, P, that it constructs groups together dfastates by their behavior. Two dfa states, di , dj ∈ ps , have the same behavior inccresponse to all input characters. That is, if di → dx , dj → dy , and di , dj ∈ ps ,Set partitionA set partition of S is a collection ofnonempty, disjoint subsets of S whoseunion is exactly S.54 CHAPTER 2 ScannersT ← {DA ,P ← ∅{ D − DA } };Split(S) {for each c ∈ 6 doif c splits S into s1 and s2then return {s1 , s2 };while (P 6= T) doP ← T;end;T ← ∅;for each set p ∈ P doT ← T ∪ Split(p);end;return S;}end;n FIGURE 2.9 DFA Minimization Algorithm.then dx and dy must be in the same set pt .
This property holds for everyset ps ∈ P, for every pair of states di , dj ∈ ps , and for every input character, c.Thus, the states in ps have the same behavior with respect to input charactersand the remaining sets in P.To minimize a dfa, each set ps ∈ P should be as large as possible, withinthe constraint of behavioral equivalence. To construct such a partition, thealgorithm begins with an initial rough partition that obeys all the properties except behavioral equivalence. It then iteratively refines that partitionto enforce behavioral equivalence. The initial partition contains two sets,p0 = D A and p1 = {D − D A }. This separation ensures that no set in thefinal partition contains both accepting and nonaccepting states, since thealgorithm never combines two partitions.The algorithm refines the initial partition by repeatedly examining eachps ∈ P to look for states in ps that have different behavior for some inputstring.
Clearly, it cannot trace the behavior of the dfa on every string. Itcan, however, simulate the behavior of a given state in response to a singleinput character. It uses a simple condition for refining the partition: a symbolc ∈ 6 must produce the same behavior for every state di ∈ ps . If it does not,the algorithm splits ps around c.This splitting action is the key to understanding the algorithm.
For di anddj to remain together in ps , they must take equivalent transitions on eachcccharacter c ∈ 6. That is, ∀ c ∈ 6, di → dx and dj → dy , where dx , dy ∈ pt . Anycstate dk ∈ ps where dk → dz , dz ∈/ pt , cannot remain in the same partition as diand dj . Similarly, if di and dj have transitions on c and dk does not, it cannotremain in the same partition as di and dj .Figure 2.10 makes this concrete.
The states in p1 = {di , dj , dk } are equivalentif and only if their transitions, ∀ c ∈ 6, take them to states that are, themselves, in an equivalence class. As shown, each state has a transition on a:aaadi → dx , dj → dy , and dk → dz . If dx , dy , and dz are all in the same set in2.4 From Regular Expression to Scanner 55diadxdiadidxp4p2djdkaap1dydjdzdkp2p1(a) a Does Not Split p1aa(b) a Splits p1adxp2dydjdzdkp3p5aadydzp3(c) Partitions After Split On an FIGURE 2.10 Splitting a Partition around a.the current partition, as shown on the left, then di , dj , and dk should remaintogether and a does not split p1 .On the other hand, if dx , dy , and dz are in two or more different sets, thena splits p1 . As shown in the center drawing of Figure 2.10, dx ∈ p2 whiledy and dz ∈ p3 , so the algorithm must split p1 and construct two new setsp4 = {di } and p5 = {dj , dk } to reflect the potential for different outcomeswith strings that begin with the symbol a.
The result is shown on theright side of Figure 2.10. The same split would result if state di had notransition on a.To refine a partition P, the algorithm examines each p ∈ P and each c ∈ 6.If c splits p, the algorithm constructs two new sets from p and adds themto T . (It could split p into more than two sets, all having internally consistentbehavior on c. However, creating one consistent state and lumping the restof p into another state will suffice. If the latter state is inconsistent in itsbehavior on c, the algorithm will split it in a later iteration.) The algorithmrepeats this process until it finds a partition where it can split no sets.To construct the new dfa from the final partition p, we can create a singlestate to represent each set p ∈ P and add the appropriate transitions betweenthese new representative states.
For the state representing pl , we add a transition to the state representing pm on c if some dj ∈ pl has a transition onc to some dk ∈ pm . From the construction, we know that if dj has such atransition, so does every other state in pl ; if this were not the case, the algorithm would have split pl around c.