Introduction to Algorithms. (2nd edition) Thomas Cormen_ Charles Leiserson_ Ronad Rivest (811417), страница 45
Текст из файла (страница 45)
We next probe slot T[h'(k) + 1], and so on up to slot T[m - 1]. Thenwe wrap around to slots T[0], T[1], ..., until we finally probe slot T[h'(k) - 1]. Because theinitial probe determines the entire probe sequence, there are only m distinct probe sequences.Linear probing is easy to implement, but it suffers from a problem known as primaryclustering. Long runs of occupied slots build up, increasing the average search time. Clustersarise since an empty slot preceded by i full slots gets filled next with probability (i + 1)/m.Long runs of occupied slots tend to get longer, and the average search time increases.Quadratic probingQuadratic probing uses a hash function of the form(11.5)where h' is an auxiliary hash function, c1 and c2 ≠ 0 are auxiliary constants, and i = 0, 1, ..., m- 1.
The initial position probed is T[h'(k)]; later positions probed are offset by amounts thatdepend in a quadratic manner on the probe number i. This method works much better thanlinear probing, but to make full use of the hash table, the values of c1, c2, and m areconstrained. Problem 11-3 shows one way to select these parameters. Also, if two keys havethe same initial probe position, then their probe sequences are the same, since h(k1, 0) = h(k2,0) implies h(k1, i) = h(k2, i). This property leads to a milder form of clustering, calledsecondary clustering.
As in linear probing, the initial probe determines the entire sequence,so only m distinct probe sequences are used.Double hashingDouble hashing is one of the best methods available for open addressing because thepermutations produced have many of the characteristics of randomly chosen permutations.Double hashing uses a hash function of the formh(k, i) = (h1(k) + ih2(k)) mod m,where h1 and h2 are auxiliary hash functions. The initial probe is to position T[h1(k)];successive probe positions are offset from previous positions by the amount h2(k), modulo m.Thus, unlike the case of linear or quadratic probing, the probe sequence here depends in twoways upon the key k, since the initial probe position, the offset, or both, may vary. Figure 11.5gives an example of insertion by double hashing.Figure 11.5: Insertion by double hashing.
Here we have a hash table of size 13 with h1(k) = kmod 13 and h2(k) = 1 + (k mod 11). Since 14 ≡ 1 (mod 13) and 14 ≡ 3 (mod 11), the key 14 isinserted into empty slot 9, after slots 1 and 5 are examined and found to be occupied.The value h2(k) must be relatively prime to the hash-table size m for the entire hash table to besearched.
(See Exercise 11.4-3.) A convenient way to ensure this condition is to let m be apower of 2 and to design h2 so that it always produces an odd number. Another way is to let mbe prime and to design h2 so that it always returns a positive integer less than m. For example,we could choose m prime and leth1(k) = k mod m,h2(k) = 1 + (k mod m'),where m' is chosen to be slightly less than m (say, m - 1).
For example, if k = 123456, m =701, and m' = 700, we have h1(k) = 80 and h2(k) = 257, so the first probe is to position 80, andthen every 257th slot (modulo m) is examined until the key is found or every slot is examined.Double hashing improves over linear or quadratic probing in that Θ(m2) probe sequences areused, rather than Θ(m), since each possible (h1(k), h2(k)) pair yields a distinct probe sequence.As a result, the performance of double hashing appears to be very close to the performance ofthe "ideal" scheme of uniform hashing.Analysis of open-address hashingOur analysis of open addressing, like our analysis of chaining, is expressed in terms of theload factor α = n/m of the hash table, as n and m go to infinity.
Of course, with openaddressing, we have at most one element per slot, and thus n ≤ m, which implies α ≤ 1.We assume that uniform hashing is used. In this idealized scheme, the probe sequence h(k,0), h(k, 1), ..., h(k, m - 1) used to insert or search for each key k is equally likely to be anypermutation of 0, 1, ..., m - 1 . Of course, a given key has a unique fixed probe sequenceassociated with it; what is meant here is that, considering the probability distribution on thespace of keys and the operation of the hash function on the keys, each possible probesequence is equally likely.We now analyze the expected number of probes for hashing with open addressing under theassumption of uniform hashing, beginning with an analysis of the number of probes made inan unsuccessful search.Theorem 11.6Given an open-address hash table with load factor α = n/m < 1, the expected number of probesin an unsuccessful search is at most 1/(1-α), assuming uniform hashing.Proof In an unsuccessful search, every probe but the last accesses an occupied slot that doesnot contain the desired key, and the last slot probed is empty.
Let us define the randomvariable X to be the number of probes made in an unsuccessful search, and let us also definethe event Ai , for i = 1, 2, ..., to be the event that there is an ith probe and it is to an occupiedslot. Then the event {X ≥ i} is the intersection of events A1 ∩ A2 ∩ ··· ∩ Ai-1. We will bound Pr{X ≥ i} by bounding Pr {A1 ∩ A2 ∩ ··· ∩ Ai-1}. By Exercise C.2-6,Pr {A1 ∩ A2 ∩ ··· ∩ Ai-1} = Pr{A1} · Pr{A2 | A1} · Pr{A3 | A1 ∩ A2}Pr{Ai-1 | A1 ∩ A2 ∩ ··· ∩ Ai-2}.Since there are n elements and m slots, Pr {A1} = n/m.
For j > 1, the probability that there is ajth probe and it is to an occupied slot, given that the first j - 1 probes were to occupied slots, is(n - j + 1)/(m - j + 1). This probability follows because we would be finding one of theremaining (n - (j - 1)) elements in one of the (m - (j - 1)) unexamined slots, and by theassumption of uniform hashing, the probability is the ratio of these quantities. Observing thatn < m implies that (n - j)/(m - j) ≤ n/m for all j such that 0 ≤ j < m, we have for all i such that 1≤ i ≤ m,Now we use equation (C.24) to bound the expected number of probes:The above bound of 1+α+α2+α3+··· has an intuitive interpretation. One probe is always made.With probability approximately α, the first probe finds an occupied slot so that a second probeis necessary.
With probability approximately α2, the first two slots are occupied so that a thirdprobe is necessary, and so on.If α is a constant, Theorem 11.6 predicts that an unsuccessful search runs in O(1) time. Forexample, if the hash table is half full, the average number of probes in an unsuccessful searchis at most 1/(1 - .5) = 2. If it is 90 percent full, the average number of probes is at most 1/(1 .9) = 10.Theorem 11.6 gives us the performance of the HASH-INSERT procedure almostimmediately.Corollary 11.7Inserting an element into an open-address hash table with load factor α requires at most 1/(1 α) probes on average, assuming uniform hashing.Proof An element is inserted only if there is room in the table, and thus α < 1. Inserting a keyrequires an unsuccessful search followed by placement of the key in the first empty slotfound. Thus, the expected number of probes is at most 1/(1 - α).Computing the expected number of probes for a successful search requires a little more work.Theorem 11.8Given an open-address hash table with load factor α < 1, the expected number of probes in asuccessful search is at mostassuming uniform hashing and assuming that each key in the table is equally likely to besearched for.Proof A search for a key k follows the same probe sequence as was followed when theelement with key k was inserted.
By Corollary 11.7, if k was the (i + 1)st key inserted into thehash table, the expected number of probes made in a search for k is at most 1/(1 - i/m) = m/(m- i). Averaging over all n keys in the hash table gives us the average number of probes in asuccessful search:whereis the ith harmonic number (as defined in equation (A.7)). Using thetechnique of bounding a summation by an integral, as described in Section A.2, we obtainfor a bound on the expected number of probes in a successful search.If the hash table is half full, the expected number of probes in a successful search is less than1.387.
If the hash table is 90 percent full, the expected number of probes is less than 2.559.Exercises 11.4-1Consider inserting the keys 10, 22, 31, 4, 15, 28, 17, 88, 59 into a hash table of length m = 11using open addressing with the primary hash function h'(k) = k mod m. Illustrate the result ofinserting these keys using linear probing, using quadratic probing with c1 = 1 and c2 = 3, andusing double hashing with h2(k) = 1 + (k mod (m - 1)).Exercises 11.4-2Write pseudocode for HASH-DELETE as outlined in the text, and modify HASH-INSERT tohandle the special value DELETED.Exercises 11.4-3: ⋆Suppose that we use double hashing to resolve collisions; that is, we use the hash functionh(k, i) = (h1(k)+ih2(k)) mod m. Show that if m and h2(k) have greatest common divisor d ≥ 1for some key k, then an unsuccessful search for key k examines (1/d)th of the hash tablebefore returning to slot h1(k).
Thus, when d = 1, so that m and h2(k) are relatively prime, thesearch may examine the entire hash table. (Hint: See Chapter 31.)Exercises 11.4-4Consider an open-address hash table with uniform hashing. Give upper bounds on theexpected number of probes in an unsuccessful search and on the expected number of probes ina successful search when the load factor is 3/4 and when it is 7/8.Exercises 11.4-5: ⋆Consider an open-address hash table with a load factor α. Find the nonzero value α for whichthe expected number of probes in an unsuccessful search equals twice the expected number ofprobes in a successful search.