Introduction to Algorithms. (2nd edition) Thomas Cormen_ Charles Leiserson_ Ronad Rivest (811417), страница 22
Текст из файла (страница 22)
Then show that Professor Armstrong is mistaken by showing that the resulting permutationis not uniformly random.Exercises 5.3-5:Prove that in the array P in procedure PERMUTE-BY-SORTING, the probability that allelements are unique is at least 1 - 1/n.Exercises 5.3-6Explain how to implement the algorithm PERMUTE-BY-SORTING to handle the case inwhich two or more priorities are identical. That is, your algorithm should produce a uniformrandom permutation, even if two or more priorities are identical.5.4 Probabilistic analysis and further uses of indicatorrandom variablesThis advanced section further illustrates probabilistic analysis by way of four examples. Thefirst determines the probability that in a room of k people, some pair shares the same birthday.The second example examines the random tossing of balls into bins.
The third investigates"streaks" of consecutive heads in coin flipping. The final example analyzes a variant of thehiring problem in which you have to make decisions without actually interviewing all thecandidates.5.4.1 The birthday paradoxOur first example is the birthday paradox. How many people must there be in a room beforethere is a 50% chance that two of them were born on the same day of the year? The answer issurprisingly few.
The paradox is that it is in fact far fewer than the number of days in a year,or even half the number of days in a year, as we shall see.To answer this question, we index the people in the room with the integers 1, 2, ..., k, where kis the number of people in the room. We ignore the issue of leap years and assume that allyears have n = 365 days. For i = 1, 2, ..., k, let bi be the day of the year on which person i'sbirthday falls, where 1 ≤ bi ≤ n. We also assume that birthdays are uniformly distributedacross the n days of the year, so that Pr {bi = r} = 1/n for i = 1, 2, ..., k and r = 1, 2, ..., n.The probability that two given people, say i and j, have matching birthdays depends onwhether the random selection of birthdays is independent. We assume from now on thatbirthdays are independent, so that the probability that i's birthday and j's birthday both fall onday r isPr {bi = r and bj = r} = Pr{bi = r}Pr{bj = r}= 1/n2.Thus, the probability that they both fall on the same day is(5.7)More intuitively, once bi is chosen, the probability that bj is chosen to be the same day is 1/n.Thus, the probability that i and j have the same birthday is the same as the probability that thebirthday of one of them falls on a given day.
Notice, however, that this coincidence dependson the assumption that the birthdays are independent.We can analyze the probability of at least 2 out of k people having matching birthdays bylooking at the complementary event. The probability that at least two of the birthdays match is1 minus the probability that all the birthdays are different. The event that k people havedistinct birthdays iswhere Ai is the event that person i's birthday is different from person j's for all j < i. Since wecan write Bk = Ak ∩ Bk-1, we obtain from equation (C.16) the recurrence(5.8)where we take Pr{B1} = Pr{A1} = 1 as an initial condition.
In other words, the probability thatb1, b2, ..., bk are distinct birthdays is the probability that b1, b2, ..., bk-1 are distinct birthdaystimes the probability that bk ≠ bi for i = 1, 2, ..., k - 1, given that b1, b2, ..., bk-1 are distinct.If b1, b2, ..., bk-1 are distinct, the conditional probability that bk ≠ bi for i = 1, 2, ..., k - 1 is Pr{Ak | Bk-1} = (n - k + 1)/n, since out of the n days, there are n - (k - 1) that are not taken. Weiteratively apply the recurrence (5.8) to obtainInequality (3.11), 1 + x ≤ ex, gives uswhen -k(k - 1)/2n ≤ ln(1/2). The probability that all k birthdays are distinct is at most 1/2 whenk(k - 1) = 2n ln 2 or, solving the quadratic equation, when. For n = 365,we must have k ≥ 23. Thus, if at least 23 people are in a room, the probability is at least 1/2that at least two people have the same birthday. On Mars, a year is 669 Martian days long; ittherefore takes 31 Martians to get the same effect.An analysis using indicator random variablesWe can use indicator random variables to provide a simpler but approximate analysis of thebirthday paradox.
For each pair (i, j) of the k people in the room, we define the indicatorrandom variable Xij, for 1 ≤ i < j ≤ k, byBy equation (5.7), the probability that two people have matching birthdays is 1/n, and thus byLemma 5.1, we haveE [Xij] = Pr{person i and person j have the same birthday}= 1/n.Letting X be the random variable that counts the number of pairs of individuals having thesame birthday, we haveTaking expectations of both sides and applying linearity of expectation, we obtainWhen k(k - 1) ≥ 2n, therefore, the expected number of pairs of people with the same birthdayis at least 1. Thus, if we have at leastindividuals in a room, we can expect at least twoto have the same birthday. For n = 365, if k = 28, the expected number of pairs with the samebirthday is (28 · 27)/(2 · 365) ≈ 1.0356.Thus, with at least 28 people, we expect to find at least one matching pair of birth-days.
OnMars, where a year is 669 Martian days long, we need at least 38 Martians.The first analysis, which used only probabilities, determined the number of people requiredfor the probability to exceed 1/2 that a matching pair of birthdays exists, and the secondanalysis, which used indicator random variables, determined the number such that theexpected number of matching birthdays is 1. Although the exact numbers of people differ for.the two situations, they are the same asymptotically:5.4.2 Balls and binsConsider the process of randomly tossing identical balls into b bins, numbered 1, 2,..., b.
Thetosses are independent, and on each toss the ball is equally likely to end up in any bin. Theprobability that a tossed ball lands in any given bin is 1/b. Thus, the ball-tossing process is asequence of Bernoulli trials (see Appendix C.4) with a probability 1/b of success, wheresuccess means that the ball falls in the given bin. This model is particularly useful foranalyzing hashing (see Chapter 11), and we can answer a variety of interesting questionsabout the ball-tossing process.
(Problem C-1 asks additional questions about balls and bins.)How many balls fall in a given bin? The number of balls that fall in a given bin follows thebinomial distribution b(k; n, 1/b). If n balls are tossed, equation (C.36) tells us that theexpected number of balls that fall in the given bin is n/b.How many balls must one toss, on the average, until a given bin contains a ball? The numberof tosses until the given bin receives a ball follows the geometric distribution with probability1/b and, by equation (C.31), the expected number of tosses until success is 1/(1/b) = b.How many balls must one toss until every bin contains at least one ball? Let us call a toss inwhich a ball falls into an empty bin a "hit." We want to know the expected number n of tossesrequired to get b hits.The hits can be used to partition the n tosses into stages.
The ith stage consists of the tossesafter the (i - 1)st hit until the ith hit. The first stage consists of the first toss, since we areguaranteed to have a hit when all bins are empty. For each toss during the ith stage, there are i- 1 bins that contain balls and b - i + 1 empty bins. Thus, for each toss in the ith stage, theprobability of obtaining a hit is (b-i +1)/b.Let ni denote the number of tosses in the ith stage.
Thus, the number of tosses required to getb hits is. Each random variable ni has a geometric distribution with probability ofsuccess (b - i + 1)/b and, by equation (C.31),By linearity of expectation,The last line follows from the bound (A.7) on the harmonic series. It therefore takesapproximately b ln b tosses before we can expect that every bin has a ball. This problem isalso known as the coupon collector's problem, and says that a person trying to collect each ofb different coupons must acquire approximately b ln b randomly obtained coupons in order tosucceed.5.4.3 StreaksSuppose you flip a fair coin n times.
What is the longest streak of consecutive heads that youexpect to see? The answer is Θ(lg n), as the following analysis shows.We first prove that the expected length of the longest streak of heads is O(lg n). Theprobability that each coin flip is a head is 1/2. Let Aik be the event that a streak of heads oflength at least k begins with the ith coin flip or, more precisely, the event that the kconsecutive coin flips i, i + 1, ..., i + k - 1 yield only heads, where 1 ≤ k ≤ n and 1 ≤ i ≤ n -k+1. Since coin flips are mutually independent, for any given event Aik, the probability that allk flips are heads is(5.9)and thus the probability that a streak of heads of length at least 2 ⌈lg n⌉ begins in position i isquite small.
There are at most n - 2 ⌈lg n⌉ + 1 positions where such a streak can begin. Theprobability that a streak of heads of length at least 2 ⌈lg n⌉ begins anywhere is therefore(5.10)since by Boole's inequality (C.18), the probability of a union of events is at most the sum ofthe probabilities of the individual events. (Note that Boole's inequality holds even for eventssuch as these that are not independent.)We now use inequality (5.10) to bound the length of the longest streak. For j = 0, 1, 2,..., n, letLj be the event that the longest streak of heads has length exactly j, and let L be the length ofthe longest streak. By the definition of expected value,(5.11)We could try to evaluate this sum using upper bounds on each Pr {Lj} similar to thosecomputed in inequality (5.10).
Unfortunately, this method would yield weak bounds. We canuse some intuition gained by the above analysis to obtain a good bound, however. Informally,we observe that for no individual term in the summation in equation (5.11) are both thefactors j and Pr {Lj} large.
Why? When j ≥ 2 ⌈lg n⌉, then Pr {Lj} is very small, and when j <2 ⌈lgn⌉, then j is fairly small. More formally, we note that the events Lj for j = 0, 1,..., n aredisjoint, and so the probability that a streak of heads of length at least 2 ⌈lg n⌉ beginsanywhere is. By inequality (5.10), we have. Also, noting that, we have that. Thus, we obtainThe chances that a streak of heads exceeds r ⌈lg n⌉ flips diminish quickly with r.
For r ≥ 1,the probability that a streak of r ⌈lg n⌉ heads starts in position i isPr {Ai,r⌈ lg n⌉} = 1/2r⌈ lg n⌉≤ 1/nr.Thus, the probability is at most n/nr = 1/nr-1 that the longest streak is at least r ⌈lg n⌉, orequivalently, the probability is at least 1 - 1/nr-1 that the longest streak has length less than r⌈lg n⌉.As an example, for n = 1000 coin flips, the probability of having a streak of at least 2 ⌈lg n⌉ =20 heads is at most 1/n = 1/1000. The chances of having a streak longer than 3 ⌈lg n⌉ = 30heads is at most 1/n2 = 1/1,000,000.We now prove a complementary lower bound: the expected length of the longest streak ofheads in n coin flips is Ω(lg n). To prove this bound, we look for streaks of length s bypartitioning the n flips into approximately n/s groups of s flips each.
If we choose s = ⌊(lgn)/2⌋, we can show that it is likely that at least one of these groups comes up all heads, andhence it is likely that the longest streak has length at least s = Ω(lg n). We will then show thatthe longest streak has expected length Ω(lg n).We partition the n coin flips into at least ⌊n/ ⌊(lg n)/2⌋⌋ groups of ⌊(lg n)/2⌋ consecutiveflips, and we bound the probability that no group comes up all heads. By equation (5.9), theprobability that the group starting in position i comes up all heads isPr {Ai,⌊ (lg n)/⌋} = 1/2⌊(lgn)/⌋≥.The probability that a streak of heads of length at least ⌊(lg n)/2⌋ does not begin in position iis therefore at most.