Introduction to Algorithms. (2nd edition) Thomas Cormen_ Charles Leiserson_ Ronad Rivest (811417), страница 36
Текст из файла (страница 36)
We assume that T(n) = O(1) for n less thansome constant; we shall pick this constant later. We also pick a constant a such that thefunction described by the O(n) term above (which describes the non-recursive component ofthe running time of the algorithm) is bounded from above by an for all n > 0. Using thisinductive hypothesis, we haveIn order to complete the proof, we need to show that for sufficiently large n, this lastexpression is at most cn or, equivalently, that cn/4 - c/2 - an ≥ 0.
If we add c/2 to both sidesand factor out n, we get n(c/4 - a) ≥ c/2. As long as we choose the constant c so that c/4 - a >0, i.e., c > 4a, we can divide both sides by c/4 - a, givingThus, if we assume that T(n) = O(1) for n < 2c/(c -4a), we have T(n) = O(n). We conclude thatany order statistic, and in particular the median, can be determined on average in linear time.Exercises 9.2-1Show that in RANDOMIZED-SELECT, no recursive call is ever made to a 0-length array.Exercises 9.2-2Argue that the indicator random variable Xk and the value T(max(k - 1, n - k)) areindependent.Exercises 9.2-3Write an iterative version of RANDOMIZED-SELECT.Exercises 9.2-4Suppose we use RANDOMIZED-SELECT to select the minimum element of the array A =3, 2, 9, 0, 7, 5, 4, 8, 6, 1 .
Describe a sequence of partitions that results in a worst-caseperformance of RANDOMIZED-SELECT.9.3 Selection in worst-case linear timeWe now examine a selection algorithm whose running time is O(n) in the worst case. LikeRANDOMIZED-SELECT, the algorithm SELECT finds the desired element by recursivelypartitioning the input array.
The idea behind the algorithm, however, is to guarantee a goodsplit when the array is partitioned. SELECT uses the deterministic partitioning algorithmPARTITION from quicksort (see Section 7.1), modified to take the element to partitionaround as an input parameter.The SELECT algorithm determines the ith smallest of an input array of n > 1 elements byexecuting the following steps. (If n = 1, then SELECT merely returns its only input value asthe ith smallest.)1. Divide the n elements of the input array into ⌊n/5⌋ groups of 5 elements each and atmost one group made up of the remaining n mod 5 elements.2.
Find the median of each of the ⌈n/5⌉ groups by first insertion sorting the elements ofeach group (of which there are at most 5) and then picking the median from the sortedlist of group elements.3. Use SELECT recursively to find the median x of the ⌈n/5⌉ medians found in step 2.(If there are an even number of medians, then by our convention, x is the lowermedian.)4. Partition the input array around the median-of-medians x using the modified version ofPARTITION. Let k be one more than the number of elements on the low side of thepartition, so that x is the kth smallest element and there are n-k elements on the highside of the partition.5. If i = k, then return x.
Otherwise, use SELECT recursively to find the ith smallestelement on the low side if i < k, or the (i - k)th smallest element on the high side if i >k.To analyze the running time of SELECT, we first determine a lower bound on the number ofelements that are greater than the partitioning element x. Figure 9.1 is helpful in visualizingthis bookkeeping. At least half of the medians found in step 2 are greater than[1] the medianof-medians x.
Thus, at least half of the ⌈n/5⌉ groups contribute 3 elements that are greaterthan x, except for the one group that has fewer than 5 elements if 5 does not divide n exactly,and the one group containing x itself. Discounting these two groups, it follows that thenumber of elements greater than x is at leastFigure 9.1: Analysis of the algorithm SELECT. The n elements are represented by smallcircles, and each group occupies a column. The medians of the groups are whitened, and themedian-of-medians x is labeled. (When finding the median of an even number of elements,we use the lower median.) Arrows are drawn from larger elements to smaller, from which itcan be seen that 3 out of every full group of 5 elements to the right of x are greater than x, and3 out of every group of 5 elements to the left of x are less than x.
The elements greater than xare shown on a shaded background.Similarly, the number of elements that are less than x is at least 3n/10 - 6. Thus, in the worstcase, SELECT is called recursively on at most 7n/10 + 6 elements in step 5.We can now develop a recurrence for the worst-case running time T(n) of the algorithmSELECT. Steps 1, 2, and 4 take O(n) time. (Step 2 consists of O(n) calls of insertion sort onsets of size O(1).) Step 3 takes time T(⌈n/5⌉), and step 5 takes time at most T(7n/10+ 6),assuming that T is monotonically increasing. We make the assumption, which seemsunmotivated at first, that any input of 140 or fewer elements requires O(1) time; the origin ofthe magic constant 140 will be clear shortly.
We can therefore obtain the recurrenceWe show that the running time is linear by substitution. More specifically, we will show thatT(n) ≤ cn for some suitably large constant c and all n > 0. We begin by assuming that T(n) ≤cn for some suitably large constant c and all n ≤ 140; this assumption holds if c is largeenough. We also pick a constant a such that the function described by the O(n) term above(which describes the non-recursive component of the running time of the algorithm) isbounded above by an for all n > 0.
Substituting this inductive hypothesis into the right-handside of the recurrence yieldsT(n) ≤ c ⌈n/5⌉ + c(7n/10 + 6) + an≤ cn/5 + c + 7cn/10 + 6c + an= 9cn/10 + 7c + an= cn + (-cn/10 + 7c + an) ,which is at most cn if(9.2)Inequality (9.2) is equivalent to the inequality c ≥ 10a(n/(n - 70)) when n > 70. Because weassume that n ≥ 140, we have n/(n - 70) ≤ 2, and so choosing c ≥ 20a will satisfy inequality(9.2).
(Note that there is nothing special about the constant 140; we could replace it by anyinteger strictly greater than 70 and then choose c accordingly.) The worst-case running time ofSELECT is therefore linear.As in a comparison sort (see Section 8.1), SELECT and RANDOMIZED-SELECT determineinformation about the relative order of elements only by comparing elements. Recall fromChapter 8 that sorting requires Ω(n lg n) time in the comparison model, even on average (seeProblem 8-1). The linear-time sorting algorithms in Chapter 8 make assumptions about theinput. In contrast, the linear-time selection algorithms in this chapter do not require anyassumptions about the input. They are not subject to the Ω(n lg n) lower bound because theymanage to solve the selection problem without sorting.Thus, the running time is linear because these algorithms do not sort; the linear-time behavioris not a result of assumptions about the input, as was the case for the sorting algorithms inChapter 8.
Sorting requires Ω(n lg n) time in the comparison model, even on average (seeProblem 8-1), and thus the method of sorting and indexing presented in the introduction tothis chapter is asymptotically inefficient.Exercises 9.3-1In the algorithm SELECT, the input elements are divided into groups of 5.
Will the algorithmwork in linear time if they are divided into groups of 7? Argue that SELECT does not run inlinear time if groups of 3 are used.Exercises 9.3-2Analyze SELECT to show that if n ≥ 140, then at least ⌈n/4⌉ elements are greater than themedian-of-medians x and at least ⌈n/4⌉ elements are less than x.Exercises 9.3-3Show how quicksort can be made to run in O(n lg n) time in the worst case.Exercises 9.3-4: ⋆Suppose that an algorithm uses only comparisons to find the ith smallest element in a set of nelements.
Show that it can also find the i - 1 smaller elements and the n - i larger elementswithout performing any additional comparisons.Exercises 9.3-5Suppose that you have a "black-box" worst-case linear-time median subroutine. Give asimple, linear-time algorithm that solves the selection problem for an arbitrary order statistic.Exercises 9.3-6The kth quantiles of an n-element set are the k - 1 order statistics that divide the sorted set intok equal-sized sets (to within 1). Give an O(n lg k)-time algorithm to list the kth quantiles of aset.Exercises 9.3-7Describe an O(n)-time algorithm that, given a set S of n distinct numbers and a positiveinteger k ≤ n, determines the k numbers in S that are closest to the median of S.Exercises 9.3-8Let X[1 .. n] and Y [1 ..
n] be two arrays, each containing n numbers already in sorted order.Give an O(lg n)-time algorithm to find the median of all 2n elements in arrays X and Y.Exercises 9.3-9Professor Olay is consulting for an oil company, which is planning a large pipeline runningeast to west through an oil field of n wells. From each well, a spur pipeline is to be connecteddirectly to the main pipeline along a shortest path (either north or south), as shown in Figure9.2. Given x- and y-coordinates of the wells, how should the professor pick the optimallocation of the main pipeline (the one that minimizes the total length of the spurs)? Show thatthe optimal location can be determined in linear time.Figure 9.2: Professor Olay needs to determine the position of the east-west oil pipeline thatminimizes the total length of the north-south spurs.Problems 9-1: Largest i numbers in sorted orderGiven a set of n numbers, we wish to find the i largest in sorted order using a comparisonbased algorithm.