1610906232-7ba7dbaea13262b50a6029d682cb7f1b (824370), страница 23
Текст из файла (страница 23)
To check whether two deterministicautomata are identical is not a difficult isomorphism problem, because statescan be identified starting from the initial states, with the help of the labels onthe transitions.In contrast, the only way we know how to tell whether two nondeterministicautomata, or two regular expressions, are equivalent is by converting them intotwo deterministic finite automata, and then testing them for equivalence. Thealgorithm is, of course, exponential.We summarize our discussion of the algorithmic problems related to regularlanguages and their representations as follows:Theorem 2.6.1: (a) There is an exponential algorithm which, given a nondeterministic finite automaton, constructs an equivalent deterministic finite automaton.(b) There is a polynomial algorithm which, given a r·egular expression, constructs an equivalent nondeterministic finite automaton.(c) There is an exponential algorithm which, given a nondeterministic finiteautomaton, constructs an equivalent regular expression.(d) There is a polynomial algorithm which, given a deterministic finite automaton, constructs an equivalent deterministic finite automaton with the smallest possible number of states.(e) There is a polynomial algorithm which, given two deterministic finite automata, decides whether they are equivalent.(I) There is an exponential algorithm which, given two nondeterministic finiteautomata, decides whether they are equivalent; similarly for the equivalenceof two regular expressions.We know that the exponential complexity in (a) and (c) above is necessary, because, as Example 2.2.5 and Problem 2.3.8 indicate, the output of thealgorithm (in (a), the deterministic automaton; in (c), the equivalent regularexpression) may have to be exponential.
There are, however, three importantquestions that remain unresolved in Theorem 2.6.1:(1) Is there a polynomial algorithm for determining whether two given nondeterministic finite automata are eqUivalent, or is the exponential complexityin (f) inherent?(2) Can we find in polynomial time the nondeterministic automaton with thefewest states that is equivalent to a given nondeterministic automaton? Wecan certainly do so in exponential time: Try all possible nondeterministicautomata with fewer states than the given one, testing equivalence in eachcase using the exponential algorithm in (f).1052.6: Algorithms for Finite Automata(3) More intriguingly, suppose that we are given a nondeterministic finite automaton and we wish to find the equivalent deterministic finite automatonwith the fewest states.
This can be accomplished by combining the algorithms for (a) and (d) above. However, the number of steps may be exponential in the size of the given nondeterministic automaton, even thoughthe end result may be small -simply because the intermediate result, theunoptimized deterministic automaton produced by the subset construction,may have exponentially more states than necessary. Is there an algorithmthat produces directly the minimum-state equivalent deterministic automaton in time which is bounded by a polynomial in the input and the finaloutput?As we shall see in Chapter 7 on NP-completeness, we strongly suspect thatall three of these questions have negative answers although at present nobodyknows how to prove it.Finite Automata as AlgorithmsThere is something very basic that can be said about deterministic finite automata in connection with algorithms: A deterministic finite automaton M isan efficient algorithm for deciding whether a given string is in L( M).
For example, the deterministic finite automaton in Figure 2-23 can be rendered as thefollowing algorithm:aaFigure 2-23ql:q2:q3:Let a := get-next-symbol;if a = end-of-file then reject;else if a = a then goto ql;else if a = b then goto q2;Let a := get-next-symbol;if a = end-of-file then reject;else if a = a then goto q2;else if a = b then goto q3;Let a := get-next-symbol;106Chapter 2: FINITE AUTOMATAif a = end-of-file then accept;else if a = a then goto q3;else if a = b then goto ql;To render a deterministic finite automaton M = (K,~, 8, s, F) as an algorithm, for each state in K we have I~ I + 2 instructions, of which the first obtainsthe next input symbol, and each of the others is responsible for performing thecorrect action for a particular value of the input symbol -or for the case in whichwe have reached the end of the input string, an event that we call "end-of-file."We can express formally the discussion above as follows:Theorem 2.6.2: If L is a regular language, then there is an algorithm which,given W E ~*, tests whether it is in L in O(lwl) time.But how about nondeterministic finite automata? They are definitely apowerful notational simplification, and they are the most natural and directway of rendering regular expressions as automata (recall the constructions in theproof of Theorem 2.3.1), but they do not obviously correspond to algorithms.Of course, we can always transform a given nondeterministic finite automatonto the equivalent deterministic one by the subset construction in the proof ofTheorem 2.2.1, but the construction itself (and the ensuing automaton) maybe exponential.
The question arises, can we "run" a nundeterministic finiteautomaton directly, very much the same way we run deterministic ones? Wenext point out that, with a modest loss in speed, we can.Recall the idea behind the subset construction: After having read part ofthe input, a nondeterministic automaton can be in anyone of a set of states.The subset construction computes all these possible sets of states. But when weare only interested in running a single string through the automaton, perhaps abetter idea is this: We can calculate the sets of states "on the fly," as neededand suggested by the input string.Concretely, suppose that M = (K,~, 6., s, F) is a nondeterministic finiteautomaton, and consider the following algorithm:50 := E(s), n := 0;repeat the followingset n := n + 1, and let a be the nth input symbol;if a f end-of-file then5 n := U{E(q): for some p E 5 n - 1 , (p, a, q) E 6.}until a = end-af-fileif 5 n - 1 n F f 0 then accept else rejectHere E(q) stands for the set {p: (q, e)tion.f-M(p, e)}, as in the subset construc-2.6: Algorithms for Finite Automata107q2aqO>Cr----b~,e--_+-jq~I~---a------~a,bbba,bFigure 2-24Example 2.6.1: Let us "run," using this algorithm, the nondeterministic finiteautomaton in Figure 2-24 on the input string aaaba.
The various values for theset 5 n are shown below.50 ={qO, qd,51 ={ qo, qI, q2},52 ={qO,ql,q2},53 ={QO,qI,q2},54 ={qi , Q2, Q3, Q4 } •55 ={Q2,Q3,Q4}'The machine ends up accepting the input aaaba, because 55 contains a finalstate.<)It is easy to prove by induction on the length of the input string thatthis algorithm maintains the set 5 n of all states of M that could be reachedby reading the first n symbols of the input.
In other words, it simulates theequivalent deterministic finite automaton without ever constructing it. And itstime requirements are quite modest, as the following theorem states (for a proofof the time bound, as well as an improvement, see Problem 2.6.2).Theorem 2.6.3: If M = (K,~,~, s, F) is a nondeterministic finite automaton,then there is an algorithm which, given w E ~', tests whether it is in L(M) intime O(IKI2Iwl).Suppose next that we are given a regular expression a over the alphabetand we wish to determine whether a given string w E ~. is in L[a], thelanguage generated by a.
Since a can be easily transformed into an equivalentnondeterministic finite automaton M, of size comparable to that of a (recall the~,Chapter 2: FINITE AUTOMATA108constructions in Theorem 2.3.1), the algorithm above is also useful for answeringsuch questions. tA related computational problem, which also can be solved by methodsbased on finite automata, is that of string matching, a most central problemin computer systems and their applications.
Let us fix a string x E ~', whichwe shall call the pattern. We wish to devise an efficient algorithm which, givenany string w, the text (presumably much longer than x), determines whether xoccurs as a substring of w. Notice that we are not interested in an algorithmthat takes both x and w as inputs and tells us if x occurs in w; we want ouralgorithm, call it Ax) to specialize in discovering the pattern x in all possiblelonger strings. Our strategy is, naturally enough, to design a finite automatonthat accepts the language Lx = {w E ~. : x is a substring of w}.• 0b"0a..0?O~--b-"""-<O)----"-<O)-----I"~~aaba,b0(a)a(b)Figure 2-25In fact, it is trivial to design a nondeterministic finite automaton for accepting Lx.
For example, if ~ = {a, b} and x = ababaab, the correspondingnondeterministic finite automaton is shown in Figure 2-25(a). But in order toturn this into a useful algorithm we would have to resort to the direct simulationof Theorem 2.6.3 -with its running time O(\X\2\W\), polynomial but very slowfor this application- or convert it into an equivalent deterministic automatontFor the reader familiar with the Unix operating system, this algorithm lies at thebasis of the commands grep and egrep.2.6: Algorithms for Finite Automata109-a construction which we know is potentially exponential.
Fortunately, in thecase of nondeterministic automata arising in string-matching applications, thesubset construction is always efficient, and the resulting deterministic automaton Mx has exactly the same number of states as the original nondeterministicone (see Figure 2-25(b)). It is clearly the minimal equivalent automaton. Thisautomaton Mx is therefore an algorithm for testing whether w E Lx in timeO(lwl), for any string w E ~*.Still, this algorithm has a drawback that makes it unsuitable for the manypractical applications of string matching. In real applications, the underlyingalphabet ~ has several dozens, often hundreds, of symbols.
A deterministicfinite automaton rendered as an algorithm must execute for each input symbola long sequence of if statements, one for each symbol of the alphabet (recall thefirst algorithm in this subsection). In other words, the 0 notation in the O(lwl)running time of the algorithm "hides" a potentially large constant: the runningtime is in fact O(I~llwl). For a clever remedy, see Problem 2.6.3.Problems for Section 2.62.6.1. Show that these two regular expressions do not represent the same language:aa(a U b)* U (bb)*a* and (ab U ba U a)*. Do so(a) by subjecting them to a general algorithm; and(b) by finding a string generated by one and not by the other.2.6.2.