Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 10

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 10 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 102020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 10)

Weshall see the motivation for these terms shortly. Figure 1.13 shows a plot of theGaussian distribution.From the form of (1.46) we see that the Gaussian distribution satisﬁesN (x|µ, σ 2 ) > 0.Exercise 1.7(1.47)Also it is straightforward to show that the Gaussian is normalized, so thatPierre-Simon Laplace1749–1827It is said that Laplace was seriously lacking in modesty and at onepoint declared himself to be thebest mathematician in France at thetime, a claim that was arguably true.As well as being proliﬁc in mathematics, he also made numerous contributions to astronomy, including the nebular hypothesis by which theearth is thought to have formed from the condensation and cooling of a large rotating disk of gas anddust. In 1812 he published the ﬁrst edition of ThéorieAnalytique des Probabilités, in which Laplace statesthat “probability theory is nothing but common sensereduced to calculation”.

This work included a discussion of the inverse probability calculation (later termedBayes’ theorem by Poincaré), which he used to solveproblems in life expectancy, jurisprudence, planetarymasses, triangulation, and error estimation.1.2. Probability TheoryFigure 1.13Plot of the univariate Gaussianshowing the mean µ and thestandard deviation σ.25N (x|µ, σ 2 )2σµ∞−∞Exercise 1.8N x|µ, σ 2 dx = 1.x(1.48)Thus (1.46) satisﬁes the two requirements for a valid probability density.We can readily ﬁnd expectations of functions of x under the Gaussian distribution. In particular, the average value of x is given by ∞N x|µ, σ 2 x dx = µ.(1.49)E[x] =−∞Because the parameter µ represents the average value of x under the distribution, itis referred to as the mean.

Similarly, for the second order moment ∞2N x|µ, σ 2 x2 dx = µ2 + σ 2 .(1.50)E[x ] =−∞From (1.49) and (1.50), it follows that the variance of x is given byvar[x] = E[x2 ] − E[x]2 = σ 2Exercise 1.9(1.51)and hence σ 2 is referred to as the variance parameter. The maximum of a distributionis known as its mode. For a Gaussian, the mode coincides with the mean.We are also interested in the Gaussian distribution deﬁned over a D-dimensionalvector x of continuous variables, which is given by111T −1exp − (x − µ) Σ (x − µ)(1.52)N (x|µ, Σ) =2(2π)D/2 |Σ|1/2where the D-dimensional vector µ is called the mean, the D × D matrix Σ is calledthe covariance, and |Σ| denotes the determinant of Σ.

We shall make use of themultivariate Gaussian distribution brieﬂy in this chapter, although its properties willbe studied in detail in Section 2.3.261. INTRODUCTIONFigure 1.14Illustration of the likelihood function fora Gaussian distribution, shown by thered curve. Here the black points de- p(x)note a data set of values {xn }, andthe likelihood function given by (1.53)corresponds to the product of the bluevalues.

Maximizing the likelihood involves adjusting the mean and variance of the Gaussian so as to maximize this product.N (xn |µ, σ 2 )xnxNow suppose that we have a data set of observations x = (x1 , . . . , xN )T , representing N observations of the scalar variable x. Note that we are using the typeface x to distinguish this from a single observation of the vector-valued variable(x1 , . . . , xD )T , which we denote by x.

We shall suppose that the observations aredrawn independently from a Gaussian distribution whose mean µ and variance σ 2are unknown, and we would like to determine these parameters from the data set.Data points that are drawn independently from the same distribution are said to beindependent and identically distributed, which is often abbreviated to i.i.d. We haveseen that the joint probability of two independent events is given by the product ofthe marginal probabilities for each event separately. Because our data set x is i.i.d.,we can therefore write the probability of the data set, given µ and σ 2 , in the form2p(x|µ, σ ) =NN xn |µ, σ 2 .(1.53)n=1Section 1.2.5When viewed as a function of µ and σ 2 , this is the likelihood function for the Gaussian and is interpreted diagrammatically in Figure 1.14.One common criterion for determining the parameters in a probability distribution using an observed data set is to ﬁnd the parameter values that maximize thelikelihood function.

This might seem like a strange criterion because, from our foregoing discussion of probability theory, it would seem more natural to maximize theprobability of the parameters given the data, not the probability of the data given theparameters. In fact, these two criteria are related, as we shall discuss in the contextof curve ﬁtting.For the moment, however, we shall determine values for the unknown parameters µ and σ 2 in the Gaussian by maximizing the likelihood function (1.53). In practice, it is more convenient to maximize the log of the likelihood function. Becausethe logarithm is a monotonically increasing function of its argument, maximizationof the log of a function is equivalent to maximization of the function itself. Takingthe log not only simpliﬁes the subsequent mathematical analysis, but it also helpsnumerically because the product of a large number of small probabilities can easilyunderﬂow the numerical precision of the computer, and this is resolved by computinginstead the sum of the log probabilities.

From (1.46) and (1.53), the log likelihood1.2. Probability Theory27function can be written in the formN1 NNln p x|µ, σ 2 = − 2ln σ 2 −ln(2π).(xn − µ)2 −2σ22(1.54)n=1Exercise 1.11Maximizing (1.54) with respect to µ, we obtain the maximum likelihood solutiongiven byN1 µML =xn(1.55)Nn=1which is the sample mean, i.e., the mean of the observed values {xn }. Similarly,maximizing (1.54) with respect to σ 2 , we obtain the maximum likelihood solutionfor the variance in the form2=σMLN1 (xn − µML )2N(1.56)n=1Section 1.1Exercise 1.12which is the sample variance measured with respect to the sample mean µML . Notethat we are performing a joint maximization of (1.54) with respect to µ and σ 2 , butin the case of the Gaussian distribution the solution for µ decouples from that for σ 2so that we can ﬁrst evaluate (1.55) and then subsequently use this result to evaluate(1.56).Later in this chapter, and also in subsequent chapters, we shall highlight the signiﬁcant limitations of the maximum likelihood approach.

Here we give an indicationof the problem in the context of our solutions for the maximum likelihood parameter settings for the univariate Gaussian distribution. In particular, we shall showthat the maximum likelihood approach systematically underestimates the varianceof the distribution. This is an example of a phenomenon called bias and is relatedto the problem of over-ﬁtting encountered in the context of polynomial curve ﬁtting.2We ﬁrst note that the maximum likelihood solutions µML and σMLare functions ofthe data set values x1 , .

. . , xN . Consider the expectations of these quantities withrespect to the data set values, which themselves come from a Gaussian distributionwith parameters µ and σ 2 . It is straightforward to show thatE[µML ] = µN −12E[σML ] =σ2N(1.57)(1.58)so that on average the maximum likelihood estimate will obtain the correct mean butwill underestimate the true variance by a factor (N − 1)/N . The intuition behindthis result is given by Figure 1.15.From (1.58) it follows that the following estimate for the variance parameter isunbiasedNN1 2σ2 ==(xn − µML )2 .(1.59)σMLN −1N −1n=1281. INTRODUCTIONFigure 1.15Illustration of how bias arises in using maximum likelihood to determine the varianceof a Gaussian.

The green curve showsthe true Gaussian distribution from whichdata is generated, and the three red curvesshow the Gaussian distributions obtainedby ﬁtting to three data sets, each consisting of two data points shown in blue, using the maximum likelihood results (1.55)and (1.56). Averaged across the three datasets, the mean is correct, but the varianceis systematically under-estimated becauseit is measured relative to the sample meanand not relative to the true mean.(a)(b)(c)In Section 10.1.3, we shall see how this result arises automatically when we adopt aBayesian approach.Note that the bias of the maximum likelihood solution becomes less signiﬁcantas the number N of data points increases, and in the limit N → ∞ the maximumlikelihood solution for the variance equals the true variance of the distribution thatgenerated the data.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.