Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 7

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 7 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 72020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 7)

Let us suppose that in so doing we pick the redbox 40% of the time and we pick the blue box 60% of the time, and that when weremove an item of fruit from a box we are equally likely to select any of the piecesof fruit in the box.In this example, the identity of the box that will be chosen is a random variable,which we shall denote by B. This random variable can take one of two possiblevalues, namely r (corresponding to the red box) or b (corresponding to the bluebox). Similarly, the identity of the fruit is also a random variable and will be denotedby F .

It can take either of the values a (for apple) or o (for orange).To begin with, we shall deﬁne the probability of an event to be the fractionof times that event occurs out of the total number of trials, in the limit that the totalnumber of trials goes to inﬁnity. Thus the probability of selecting the red box is 4/10Figure 1.9We use a simple example of twocoloured boxes each containing fruit(apples shown in green and oranges shown in orange) to introduce the basic ideas of probability.1.2. Probability Theory13ci}Figure 1.10 We can derive the sum and product rules of probability byconsidering two random variables, X, which takes the values {xi } wherei = 1, .

. . , M , and Y , which takes the values {yj } where j = 1, . . . , L.In this illustration we have M = 5 and L = 3. If we consider a totalnumber N of instances of these variables, then we denote the numberof instances where X = xi and Y = yj by nij , which is the number of yjpoints in the corresponding cell of the array. The number of points incolumn i, corresponding to X = xi , is denoted by ci , and the number ofpoints in row j, corresponding to Y = yj , is denoted by rj .nij}rjxiand the probability of selecting the blue box is 6/10. We write these probabilitiesas p(B = r) = 4/10 and p(B = b) = 6/10.

Note that, by deﬁnition, probabilitiesmust lie in the interval [0, 1]. Also, if the events are mutually exclusive and if theyinclude all possible outcomes (for instance, in this example the box must be eitherred or blue), then we see that the probabilities for those events must sum to one.We can now ask questions such as: “what is the overall probability that the selection procedure will pick an apple?”, or “given that we have chosen an orange,what is the probability that the box we chose was the blue one?”. We can answerquestions such as these, and indeed much more complex questions associated withproblems in pattern recognition, once we have equipped ourselves with the two elementary rules of probability, known as the sum rule and the product rule.

Havingobtained these rules, we shall then return to our boxes of fruit example.In order to derive the rules of probability, consider the slightly more general example shown in Figure 1.10 involving two random variables X and Y (which couldfor instance be the Box and Fruit variables considered above).

We shall suppose thatX can take any of the values xi where i = 1, . . . , M , and Y can take the values yjwhere j = 1, . . . , L. Consider a total of N trials in which we sample both of thevariables X and Y , and let the number of such trials in which X = xi and Y = yjbe nij . Also, let the number of trials in which X takes the value xi (irrespectiveof the value that Y takes) be denoted by ci , and similarly let the number of trials inwhich Y takes the value yj be denoted by rj .The probability that X will take the value xi and Y will take the value yj iswritten p(X = xi , Y = yj ) and is called the joint probability of X = xi andY = yj . It is given by the number of points falling in the cell i,j as a fraction of thetotal number of points, and hencep(X = xi , Y = yj ) =nij.N(1.5)Here we are implicitly considering the limit N → ∞. Similarly, the probability thatX takes the value xi irrespective of the value of Y is written as p(X = xi ) and isgiven by the fraction of the total number of points that fall in column i, so thatp(X = xi ) =ci.N(1.6)Because the number of instances in column i in Figure 1.10 is just the sum of thenumber of instances in each cell of that column, we have ci = j nij and therefore,141.

INTRODUCTIONfrom (1.5) and (1.6), we havep(X = xi ) =Lp(X = xi , Y = yj )(1.7)j =1which is the sum rule of probability. Note that p(X = xi ) is sometimes called themarginal probability, because it is obtained by marginalizing, or summing out, theother variables (in this case Y ).If we consider only those instances for which X = xi , then the fraction ofsuch instances for which Y = yj is written p(Y = yj |X = xi ) and is called theconditional probability of Y = yj given X = xi .

It is obtained by ﬁnding thefraction of those points in column i that fall in cell i,j and hence is given byp(Y = yj |X = xi ) =nij.ci(1.8)From (1.5), (1.6), and (1.8), we can then derive the following relationshipnijnij ci·=Nci N= p(Y = yj |X = xi )p(X = xi )p(X = xi , Y = yj ) =(1.9)which is the product rule of probability.So far we have been quite careful to make a distinction between a random variable, such as the box B in the fruit example, and the values that the random variablecan take, for example r if the box were the red one.

Thus the probability that B takesthe value r is denoted p(B = r). Although this helps to avoid ambiguity, it leadsto a rather cumbersome notation, and in many cases there will be no need for suchpedantry. Instead, we may simply write p(B) to denote a distribution over the random variable B, or p(r) to denote the distribution evaluated for the particular valuer, provided that the interpretation is clear from the context.With this more compact notation, we can write the two fundamental rules ofprobability theory in the following form.The Rules of Probabilitysum rulep(X) =p(X, Y )(1.10)Yproduct rulep(X, Y ) = p(Y |X)p(X).(1.11)Here p(X, Y ) is a joint probability and is verbalized as “the probability of X andY ”.

Similarly, the quantity p(Y |X) is a conditional probability and is verbalized as“the probability of Y given X”, whereas the quantity p(X) is a marginal probability1.2. Probability Theory15and is simply “the probability of X”. These two simple rules form the basis for allof the probabilistic machinery that we use throughout this book.From the product rule, together with the symmetry property p(X, Y ) = p(Y, X),we immediately obtain the following relationship between conditional probabilitiesp(Y |X) =p(X|Y )p(Y )p(X)(1.12)which is called Bayes’ theorem and which plays a central role in pattern recognitionand machine learning.

Using the sum rule, the denominator in Bayes’ theorem canbe expressed in terms of the quantities appearing in the numeratorp(X) =p(X|Y )p(Y ).(1.13)YWe can view the denominator in Bayes’ theorem as being the normalization constantrequired to ensure that the sum of the conditional probability on the left-hand side of(1.12) over all values of Y equals one.In Figure 1.11, we show a simple example involving a joint distribution over twovariables to illustrate the concept of marginal and conditional distributions. Herea ﬁnite sample of N = 60 data points has been drawn from the joint distributionand is shown in the top left.

In the top right is a histogram of the fractions of datapoints having each of the two values of Y . From the deﬁnition of probability, thesefractions would equal the corresponding probabilities p(Y ) in the limit N → ∞. Wecan view the histogram as a simple way to model a probability distribution given onlya ﬁnite number of points drawn from that distribution. Modelling distributions fromdata lies at the heart of statistical pattern recognition and will be explored in greatdetail in this book.

The remaining two plots in Figure 1.11 show the correspondinghistogram estimates of p(X) and p(X|Y = 1).Let us now return to our example involving boxes of fruit. For the moment, weshall once again be explicit about distinguishing between the random variables andtheir instantiations. We have seen that the probabilities of selecting either the red orthe blue boxes are given byp(B = r) = 4/10p(B = b) = 6/10(1.14)(1.15)respectively.

Note that these satisfy p(B = r) + p(B = b) = 1.Now suppose that we pick a box at random, and it turns out to be the blue box.Then the probability of selecting an apple is just the fraction of apples in the bluebox which is 3/4, and so p(F = a|B = b) = 3/4. In fact, we can write out all fourconditional probabilities for the type of fruit, given the selected boxp(Fp(Fp(Fp(F= a|B = r)= o|B = r)= a|B = b)= o|B = b)====1/43/43/41/4.(1.16)(1.17)(1.18)(1.19)161. INTRODUCTIONp(Y )p(X, Y )Y =2Y =1Xp(X)p(X|Y = 1)XXFigure 1.11 An illustration of a distribution over two variables, X, which takes 9 possible values, and Y , whichtakes two possible values. The top left ﬁgure shows a sample of 60 points drawn from a joint probability distribution over these variables.

The remaining ﬁgures show histogram estimates of the marginal distributions p(X)and p(Y ), as well as the conditional distribution p(X|Y = 1) corresponding to the bottom row in the top leftﬁgure.Again, note that these probabilities are normalized so thatp(F = a|B = r) + p(F = o|B = r) = 1(1.20)p(F = a|B = b) + p(F = o|B = b) = 1.(1.21)and similarlyWe can now use the sum and product rules of probability to evaluate the overallprobability of choosing an applep(F = a) = p(F = a|B = r)p(B = r) + p(F = a|B = b)p(B = b)436111×+ ×=(1.22)=4 10 4 1020from which it follows, using the sum rule, that p(F = o) = 1 − 11/20 = 9/20.1.2. Probability Theory17Suppose instead we are told that a piece of fruit has been selected and it is anorange, and we would like to know which box it came from.

This requires thatwe evaluate the probability distribution over boxes conditioned on the identity ofthe fruit, whereas the probabilities in (1.16)–(1.19) give the probability distributionover the fruit conditioned on the identity of the box. We can solve the problem ofreversing the conditional probability by using Bayes’ theorem to givep(B = r|F = o) =p(F = o|B = r)p(B = r)34202= ××= .p(F = o)4 1093(1.23)From the sum rule, it then follows that p(B = b|F = o) = 1 − 2/3 = 1/3.We can provide an important interpretation of Bayes’ theorem as follows.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.