Introduction to Algorithms. (2nd edition) Thomas Cormen_ Charles Leiserson_ Ronad Rivest (811417), страница 23
Текст из файла (страница 23)
Since the ⌊n/ ⌊(lg n)/2⌋⌋ groups are formed from mutuallyexclusive, independent coin flips, the probability that every one of these groups fails to be astreak of length ⌊(lg n)/2⌋ is at mostFor this argument, we used inequality (3.11), 1 + x ≤ ex , and the fact, which you might wantto verify, thatfor sufficiently large n.Thus, the probability that the longest streak exceeds ⌊(lg n)/2⌋ is(5.12)We can now calculate a lower bound on the expected length of the longest streak, beginningwith equation (5.11) and proceeding in a manner similar to our analysis of the upper bound:As with the birthday paradox, we can obtain a simpler but approximate analysis usingindicator random variables.
We let Xik = I{Aik} be the indicator random variable associatedwith a streak of heads of length at least k beginning with the ith coin flip. To count the totalnumber of such streaks, we defineTaking expectations and using linearity of expectation, we haveBy plugging in various values for k, we can calculate the expected number of streaks of lengthk. If this number is large (much greater than 1), then many streaks of length k are expected tooccur and the probability that one occurs is high. If this number is small (much less than 1),then very few streaks of length k are expected to occur and the probability that one occurs islow. If k = c lg n, for some positive constant c, we obtainIf c is large, the expected number of streaks of length c lg n is very small, and we concludethat they are unlikely to occur.
On the other hand, if c < 1/2, then we obtain E [X] = Θ(1/n1/2-1)= Θ(n1/2), and we expect that there will be a large number of streaks of length (1/2) lg n.Therefore, one streak of such a length is very likely to occur. From these rough estimatesalone, we can conclude that the length of the longest streak is Θ(lg n).5.4.4 The on-line hiring problemAs a final example, we consider a variant of the hiring problem. Suppose now that we do notwish to interview all the candidates in order to find the best one. We also do not wish to hireand fire as we find better and better applicants.
Instead, we are willing to settle for a candidatewho is close to the best, in exchange for hiring exactly once. We must obey one companyrequirement: after each interview we must either immediately offer the position to theapplicant or must tell them that they will not receive the job. What is the trade-off betweenminimizing the amount of interviewing and maximizing the quality of the candidate hired?We can model this problem in the following way.
After meeting an applicant, we are able togive each one a score; let score(i) denote the score given to the ith applicant, and assume thatno two applicants receive the same score. After we have seen j applicants, we know which ofthe j has the highest score, but we do not know if any of the remaining n - j applicants willhave a higher score. We decide to adopt the strategy of selecting a positive integer k < n,interviewing and then rejecting the first k applicants, and hiring the first applicant thereafterwho has a higher score than all preceding applicants. If it turns out that the best-qualifiedapplicant was among the first k interviewed, then we will hire the nth applicant.
This strategyis formalized in the procedure ON-LINE-MAXIMUM(k, n), which appears below. ProcedureON-LINE-MAXIMUM returns the index of the candidate we wish to hire.ON-LINE-MAXIMUM(k, n)1 bestscore ← -∞2 for i ← to k3do if score(i) > bestscore4then bestscore ← score(i)5 for i ← k + 1 to n6do if score(i) > bestscore7then return i8 return nWe wish to determine, for each possible value of k, the probability that we hire the mostqualified applicant.
We will then choose the best possible k, and implement the strategy withthat value. For the moment, assume that k is fixed. Let M(j) = max≤i≤j {score(i)} denote themaximum score among applicants 1 through j. Let S be the event that we succeed in choosingthe best-qualified applicant, and let Si be the event that we succeed when the best-qualifiedapplicant is the ith one interviewed. Since the various Si are disjoint, we have that. Noting that we never succeed when the best-qualified applicant is one of thefirst k, we have that Pr {Si} = 0 for i = 1, 2,..., k.
Thus, we obtain(5.13)We now compute Pr {Si}. In order to succeed when the best-qualified applicant is the ith one,two things must happen. First, the best-qualified applicant must be in position i, an eventwhich we denote by Bi. Second, the algorithm must not select any of the applicants inpositions k + 1 through i - 1, which happens only if, for each j such that k + 1 ≤ j ≤ i - 1, wefind that score(j) < bestscore in line 6. (Because scores are unique, we can ignore thepossibility of score(j) = bestscore.) In other words, it must be the case that all of the valuesscore(k + 1) through score(i - 1) are less than M(k); if any are greater than M(k) we willinstead return the index of the first one that is greater.
We use Oi to denote the event that noneof the applicants in position k + 1 through i - 1 are chosen. Fortunately, the two events Bi andOi are independent. The event Oi depends only on the relative ordering of the values inpositions 1 through i - 1, whereas Bi depends only on whether the value in position i is greaterthan all the values 1 through i - 1.
The ordering of positions 1 through i - 1 does not affectwhether i is greater than all of them, and the value of i does not affect the ordering ofpositions 1 through i - 1. Thus we can apply equation (C.15) to obtainPr {Si} = Pr {Bi ∩ Oi} = Pr {Bi} Pr {Oi}.The probability Pr {Bi} is clearly 1/n, since the maximum is equally likely to be in any one ofthe n positions.
For event Oi to occur, the maximum value in positions 1 through i - 1 must bein one of the first k positions, and it is equally likely to be in any of these i - 1 positions.Consequently, Pr {Oi} = k/(i - 1) and Pr {Si} = k/(n(i - 1)). Using equation (5.13), we haveWe approximate by integrals to bound this summation from above and below. By theinequalities (A.12), we haveEvaluating these definite integrals gives us the boundswhich provide a rather tight bound for Pr {S}.
Because we wish to maximize our probabilityof success, let us focus on choosing the value of k that maximizes the lower bound on Pr {S}.(Besides, the lower-bound expression is easier to maximize than the upper-bound expression.)Differentiating the expression (k/n)(ln n - ln k) with respect to k, we obtainSetting this derivative equal to 0, we see that the lower bound on the probability is maximizedwhen ln k = ln n - 1 = ln(n/e) or, equivalently, when k = n/e. Thus, if we implement ourstrategy with k = n/e, we will succeed in hiring our best-qualified applicant with probability atleast 1/e.Exercises 5.4-1How many people must there be in a room before the probability that someone has the samebirthday as you do is at least 1/2? How many people must there be before the probability thatat least two people have a birthday on July 4 is greater than 1/2?Exercises 5.4-2Suppose that balls are tossed into b bins.
Each toss is independent, and each ball is equallylikely to end up in any bin. What is the expected number of ball tosses before at least one ofthe bins contains two balls?Exercises 5.4-3:For the analysis of the birthday paradox, is it important that the birthdays be mutuallyindependent, or is pairwise independence sufficient? Justify your answer.Exercises 5.4-4:How many people should be invited to a party in order to make it likely that there are threepeople with the same birthday?Exercises 5.4-5:What is the probability that a k-string over a set of size n is actually a k-permutation? Howdoes this question relate to the birthday paradox?Exercises 5.4-6:Suppose that n balls are tossed into n bins, where each toss is independent and the ball isequally likely to end up in any bin.
What is the expected number of empty bins? What is theexpected number of bins with exactly one ball?Exercises 5.4-7:Sharpen the lower bound on streak length by showing that in n flips of a fair coin, theprobability is less than 1/n that no streak longer than lg n-2 lg lg n consecutive heads occurs.Problems 5-1: Probabilistic countingWith a b-bit counter, we can ordinarily only count up to 2b - 1. With R.
Morris's probabilisticcounting, we can count up to a much larger value at the expense of some loss of precision.We let a counter value of i represent a count of ni for i = 0, 1,..., 2b -1, where the ni form anincreasing sequence of nonnegative values. We assume that the initial value of the counter is0, representing a count of n0 = 0. The INCREMENT operation works on a counter containingthe value i in a probabilistic manner.
If i = 2b - 1, then an overflow error is reported.Otherwise, the counter is increased by 1 with probability 1/(ni+1 - ni), and it remainsunchanged with probability 1 - 1/(ni+1 - ni).If we select ni = i for all i ≥ 0, then the counter is an ordinary one. More interesting situationsarise if we select, say, ni = 2i-1 for i > 0 or ni = Fi (the ith Fibonacci number-see Section 3.2).For this problem, assume thatnegligible.is large enough that the probability of an overflow error isa. Show that the expected value represented by the counter after n INCREMENToperations have been performed is exactly n.b. The analysis of the variance of the count represented by the counter depends on thesequence of the ni.
Let us consider a simple case: ni = 100i for all i ≥ 0. Estimate thevariance in the value represented by the register after n INCREMENT operations havebeen performed.Problems 5-2: Searching an unsorted arrayThus problem examines three algorithms for searching for a value x in an unsorted array Aconsisting of n elements.Consider the following randomized strategy: pick a random index i into A. If A[i] = x, then weterminate; otherwise, we continue the search by picking a new random index into A. Wecontinue picking random indices into A until we find an index j such that A[j] = x or until wehave checked every element of A. Note that we pick from the whole set of indices each time,so that we may examine a given element more than once.a. Write pseudocode for a procedure RANDOM-SEARCH to implement the strategyabove.
Be sure that your algorithm terminates when all indices into A have beenpicked.b. Suppose that there is exactly one index i such that A[i] = x. What is the expectednumber of indices into A that must be picked before x is found and RANDOMSEARCH terminates?c. Generalizing your solution to part (b), suppose that there are k ≥ 1 indices i such thatA[i] = x. What is the expected number of indices into A that must be picked before x isfound and RANDOM-SEARCH terminates? Your answer should be a function of nand k.d. Suppose that there are no indices i such that A[i] = x. What is the expected number ofindices into A that must be picked before all elements of A have been checked andRANDOM-SEARCH terminates?Now consider a deterministic linear search algorithm, which we refer to asDETERMINISTIC-SEARCH.