Introduction to Algorithms. (2nd edition) Thomas Cormen_ Charles Leiserson_ Ronad Rivest (811417), страница 35
Текст из файла (страница 35)
A median, informally, is the "halfway point" of the set. When n is odd,the median is unique, occurring at i = (n + 1)/2. When n is even, there are two medians,occurring at i = n/2 and i = n/2 + 1. Thus, regardless of the parity of n, medians occur at i =⌊(n + 1)/2⌋ (the lower median) and i = ⌈(n + 1)/2⌉ (the upper median). For simplicity in thistext,however, we consistently use the phrase "the median" to refer to the lower median.This chapter addresses the problem of selecting the ith order statistic from a set of n distinctnumbers.
We assume for convenience that the set contains distinct numbers, althoughvirtually everything that we do extends to the situation in which a set contains repeatedvalues. The selection problem can be specified formally as follows:Input: A set A of n (distinct) numbers and a number i, with 1 ≤ i ≤ n.Output: The element xA that is larger than exactly i - 1 other elements of A.The selection problem can be solved in O(n lg n) time, since we can sort the numbers usingheapsort or merge sort and then simply index the ith element in the output array. There arefaster algorithms, however.In Section 9.1, we examine the problem of selecting the minimum and maximum of a set ofelements. More interesting is the general selection problem, which is investigated in thesubsequent two sections.
Section 9.2 analyzes a practical algorithm that achieves an O(n)bound on the running time in the average case. Section 9.3 contains an algorithm of moretheoretical interest that achieves the O(n) running time in the worst case.9.1 Minimum and maximumHow many comparisons are necessary to determine the minimum of a set of n elements? Wecan easily obtain an upper bound of n - 1 comparisons: examine each element of the set inturn and keep track of the smallest element seen so far. In the following procedure, we assumethat the set resides in array A, where length[A] = n.MINIMUM(A)1 min ← A[1]2 for i ← 2 to length[A]3do if min > A[i]4then min ← A[i]5 return minFinding the maximum can, of course, be accomplished with n - 1 comparisons as well.Is this the best we can do? Yes, since we can obtain a lower bound of n - 1 comparisons forthe problem of determining the minimum.
Think of any algorithm that determines theminimum as a tournament among the elements. Each comparison is a match in the tournamentin which the smaller of the two elements wins. The key observation is that every elementexcept the winner must lose at least one match. Hence, n - 1 comparisons are necessary todetermine the minimum, and the algorithm MINIMUM is optimal with respect to the numberof comparisons performed.Simultaneous minimum and maximumIn some applications, we must find both the minimum and the maximum of a set of nelements. For example, a graphics program may need to scale a set of (x, y) data to fit onto arectangular display screen or other graphical output device.
To do so, the program must firstdetermine the minimum and maximum of each coordinate.It is not difficult to devise an algorithm that can find both the minimum and the maximum ofn elements using Θ(n) comparisons, which is asymptotically optimal. Simply find theminimum and maximum independently, using n - 1 comparisons for each, for a total of 2n - 2comparisons.In fact, at most 3 ⌊n/2⌋ comparisons are sufficient to find both the minimum and themaximum.
The strategy is to maintain the minimum and maximum elements seen thus far.Rather than processing each element of the input by comparing it against the currentminimum and maximum, at a cost of 2 comparisons per element, we process elements inpairs. We compare pairs of elements from the input first with each other, and then wecompare the smaller to the current minimum and the larger to the current maximum, at a costof 3 comparisons for every 2 elements.Setting up initial values for the current minimum and maximum depends on whether n is oddor even. If n is odd, we set both the minimum and maximum to the value of the first element,and then we process the rest of the elements in pairs.
If n is even, we perform 1 comparisonon the first 2 elements to determine the initial values of the minimum and maximum, and thenprocess the rest of the elements in pairs as in the case for odd n.Let us analyze the total number of comparisons. If n is odd, then we perform 3 ⌊n/2⌋comparisons. If n is even, we perform 1 initial comparison followed by 3(n - 2)/2comparisons, for a total of 3n/2 - 2.
Thus, in either case, the total number of comparisons is atmost 3 ⌊n/2⌋.Exercises 9.1-1Show that the second smallest of n elements can be found with n + ⌈lg n⌉ - 2 comparisons inthe worst case. (Hint: Also find the smallest element.)Exercises 9.1-2: ⋆Show that ⌈3n/2⌉ - 2 comparisons are necessary in the worst case to find both the maximumand minimum of n numbers. (Hint: Consider how many numbers are potentially either themaximum or minimum, and investigate how a comparison affects these counts.)9.2 Selection in expected linear timeThe general selection problem appears more difficult than the simple problem of finding aminimum. Yet, surprisingly, the asymptotic running time for both problems is the same: Θ(n).In this section, we present a divide-and-conquer algorithm for the selection problem.
Thealgorithm RANDOMIZED-SELECT is modeled after the quicksort algorithm of Chapter 7.As in quicksort, the idea is to partition the input array recursively. But unlike quicksort, whichrecursively processes both sides of the partition, RANDOMIZED-SELECT only works onone side of the partition. This difference shows up in the analysis: whereas quicksort has anexpected running time of Θ(n lg n), the expected time of RANDOMIZED-SELECT is Θ(n).RANDOMIZED-SELECT uses the procedure RANDOMIZED-PARTITION introduced inSection 7.3. Thus, like RANDOMIZED-QUICKSORT, it is a randomized algorithm, since itsbehavior is determined in part by the output of a random-number generator.
The followingcode for RANDOMIZED-SELECT returns the ith smallest element of the array A[p .. r].RANDOMIZED-SELECT(A, p, r, i)1 if p = r2then return A[p]3 q ← RANDOMIZED-PARTITION(A, p, r)4 k ← q - p + 156789if i = k▹ the pivot value is the answerthen return A[q]elseif i < kthen return RANDOMIZED-SELECT(A, p, q - 1, i)else return RANDOMIZED-SELECT(A, q + 1, r, i - k)After RANDOMIZED-PARTITION is executed in line 3 of the algorithm, the array A[p .. r]is partitioned into two (possibly empty) subarrays A[p .. q - 1] and A[q + 1 .. r] such that eachelement of A[p ..
q - 1] is less than or equal to A[q], which in turn is less than each element ofA[q + 1 .. r]. As in quicksort, we will refer to A[q] as the pivot element. Line 4 ofRANDOMIZED-SELECT computes the number k of elements in the subarray A[p .. q], thatis, the number of elements in the low side of the partition, plus one for the pivot element. Line5 then checks whether A[q] is the ith smallest element. If it is, then A[q] is returned.Otherwise, the algorithm determines in which of the two subarrays A[p .. q - 1] and A[q + 1 ..r] the ith smallest element lies. If i < k, then the desired element lies on the low side of thepartition, and it is recursively selected from the subarray in line 8.
If i > k, however, then thedesired element lies on the high side of the partition. Since we already know k values that aresmaller than the ith smallest element of A[p .. r]—namely, the elements of A[p .. q]—thedesired element is the (i - k)th smallest element of A[q + 1 .. r], which is found recursively inline 9. The code appears to allow recursive calls to subarrays with 0 elements, but Exercise9.2-1 asks you to show that this situation cannot happen.The worst-case running time for RANDOMIZED-SELECT is Θ(n2), even to find theminimum, because we could be extremely unlucky and always partition around the largestremaining element, and partitioning takes Θ(n) time. The algorithm works well in the averagecase, though, and because it is randomized, no particular input elicits the worst-case behavior.The time required by RANDOMIZED-SELECT on an input array A[p .. r] of n elements is arandom variable that we denote by T(n), and we obtain an upper bound on E [T(n)] as follows.Procedure RANDOMIZED-PARTITION is equally likely to return any element as the pivot.Therefore, for each k such that 1 ≤ k ≤ n, the subarray A[p ..
q] has k elements (all less than orequal to the pivot) with probability 1/n. For k = 1, 2,..., n, we define indicator randomvariables Xk whereXk = I{the subarray A[p .. q] has exactly k elements} ,and so we have(9.1)When we call RANDOMIZED-SELECT and choose A[q] as the pivot element, we do notknow, a priori, if we will terminate immediately with the correct answer, recurse on thesubarray A[p .. q - 1], or recurse on the subarray A[q + 1 .. r].
This decision depends on wherethe ith smallest element falls relative to A[q]. Assuming that T(n) is monotonically increasing,we can bound the time needed for the recursive call by the time needed for the recursive callon the largest possible input. In other words, we assume, to obtain an upper bound, that the ithelement is always on the side of the partition with the greater number of elements. For a givencall of RANDOMIZED-SELECT, the indicator random variable Xk has the value 1 for exactlyone value of k, and it is 0 for all other k.
When Xk = 1, the two subarrays on which we mightrecurse have sizes k - 1 and n - k. Hence, we have the recurrenceTaking expected values, we haveIn order to apply equation (C.23), we rely on Xk and T(max(k - 1, n - k)) being independentrandom variables. Exercise 9.2-2 asks you to justify this assertion.Let us consider the expression max(k - 1, n - k). We haveIf n is even, each term from T(⌈n/2⌉) up to T(n - 1) appears exactly twice in the summation,and if n is odd, all these terms appear twice and T(⌊n/2⌋) appears once. Thus, we haveWe solve the recurrence by substitution. Assume that T(n) ≤ cn for some constant c thatsatisfies the initial conditions of the recurrence.