Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 22

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 22 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 222020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 22)

Here {αk } = 0.1 on theleft plot, {αk } = 1 in the centre plot, and {αk } = 10 in the right plot.modelled using the binomial distribution (2.9) or as 1-of-2 variables and modelledusing the multinomial distribution (2.34) with K = 2.2.3. The Gaussian DistributionThe Gaussian, also known as the normal distribution, is a widely used model for thedistribution of continuous variables. In the case of a single variable x, the Gaussiandistribution can be written in the form112N (x|µ, σ 2 ) =exp−(x−µ)(2.42)1/22σ 2(2πσ 2 )where µ is the mean and σ 2 is the variance. For a D-dimensional vector x, themultivariate Gaussian distribution takes the form111T −1exp−Σ(x−µ)(2.43)N (x|µ, Σ) =(x−µ)2(2π)D/2 |Σ|1/2Section 1.6Exercise 2.14where µ is a D-dimensional mean vector, Σ is a D × D covariance matrix, and |Σ|denotes the determinant of Σ.The Gaussian distribution arises in many different contexts and can be motivatedfrom a variety of different perspectives.

For example, we have already seen that fora single real variable, the distribution that maximizes the entropy is the Gaussian.This property applies also to the multivariate Gaussian.Another situation in which the Gaussian distribution arises is when we considerthe sum of multiple random variables. The central limit theorem (due to Laplace)tells us that, subject to certain mild conditions, the sum of a set of random variables,which is of course itself a random variable, has a distribution that becomes increasingly Gaussian as the number of terms in the sum increases (Walker, 1969). We can792.3.

The Gaussian Distribution33N =13N =2222111000.51000.510N = 1000.51Figure 2.6 Histogram plots of the mean of N uniformly distributed numbers for various values of N . Weobserve that as N increases, the distribution tends towards a Gaussian.Appendix Cillustrate this by considering N variables x1 , . . . , xN each of which has a uniformdistribution over the interval [0, 1] and then considering the distribution of the mean(x1 + · · · + xN )/N . For large N , this distribution tends to a Gaussian, as illustratedin Figure 2.6.

In practice, the convergence to a Gaussian as N increases can bevery rapid. One consequence of this result is that the binomial distribution (2.9),which is a distribution over m deﬁned by the sum of N observations of the randombinary variable x, will tend to a Gaussian as N → ∞ (see Figure 2.1 for the case ofN = 10).The Gaussian distribution has many important analytical properties, and we shallconsider several of these in detail.

As a result, this section will be rather more technically involved than some of the earlier sections, and will require familiarity withvarious matrix identities. However, we strongly encourage the reader to become proﬁcient in manipulating Gaussian distributions using the techniques presented here asthis will prove invaluable in understanding the more complex models presented inlater chapters.We begin by considering the geometrical form of the Gaussian distribution. TheCarl Friedrich Gauss1777–1855It is said that when Gauss wentto elementary school at age 7, histeacher Büttner, trying to keep theclass occupied, asked the pupils tosum the integers from 1 to 100.

Tothe teacher’s amazement, Gaussarrived at the answer in a matter of moments by notingthat the sum can be represented as 50 pairs (1 + 100,2+99, etc.) each of which added to 101, giving the answer 5,050. It is now believed that the problem whichwas actually set was of the same form but somewhatharder in that the sequence had a larger starting valueand a larger increment. Gauss was a German math-ematician and scientist with a reputation for being ahard-working perfectionist. One of his many contributions was to show that least squares can be derivedunder the assumption of normally distributed errors.He also created an early formulation of non-Euclideangeometry (a self-consistent geometrical theory that violates the axioms of Euclid) but was reluctant to discuss it openly for fear that his reputation might sufferif it were seen that he believed in such a geometry.At one point, Gauss was asked to conduct a geodeticsurvey of the state of Hanover, which led to his formulation of the normal distribution, now also knownas the Gaussian.

After his death, a study of his diaries revealed that he had discovered several important mathematical results years or even decades before they were published by others.802. PROBABILITY DISTRIBUTIONSfunctional dependence of the Gaussian on x is through the quadratic form∆2 = (x − µ)T Σ−1 (x − µ)Exercise 2.17which appears in the exponent.

The quantity ∆ is called the Mahalanobis distancefrom µ to x and reduces to the Euclidean distance when Σ is the identity matrix. TheGaussian distribution will be constant on surfaces in x-space for which this quadraticform is constant.First of all, we note that the matrix Σ can be taken to be symmetric, withoutloss of generality, because any antisymmetric component would disappear from theexponent. Now consider the eigenvector equation for the covariance matrixΣui = λi uiExercise 2.18(2.45)where i = 1, .

. . , D. Because Σ is a real, symmetric matrix its eigenvalues will bereal, and its eigenvectors can be chosen to form an orthonormal set, so thatuTi uj = Iijwhere Iij is the i, j element of the identity matrix and satisﬁes1, if i = jIij =0, otherwise.Exercise 2.19(2.44)(2.46)(2.47)The covariance matrix Σ can be expressed as an expansion in terms of its eigenvectors in the formDλ i ui uT(2.48)Σ=ii=1and similarly the inverse covariance matrix Σ−1 can be expressed as−1ΣD1=ui uTi .λi(2.49)i=1Substituting (2.49) into (2.44), the quadratic form becomes2∆ =Dy2ii=1where we have deﬁnedλiy i = uTi (x − µ).(2.50)(2.51)We can interpret {yi } as a new coordinate system deﬁned by the orthonormal vectorsui that are shifted and rotated with respect to the original xi coordinates. Formingthe vector y = (y1 , . . .

, yD )T , we havey = U(x − µ)(2.52)812.3. The Gaussian DistributionFigure 2.7The red curve shows the ellip- x2tical surface of constant probability density for a Gaussian ina two-dimensional space x =(x1 , x2 ) on which the densityis exp(−1/2) of its value atx = µ. The major axes ofthe ellipse are deﬁned by theeigenvectors ui of the covariance matrix, with corresponding eigenvalues λi .u2u1y2y1µ1/2λ21/2λ1x1Appendix Cwhere U is a matrix whose rows are given by uTi .

From (2.46) it follows that U isan orthogonal matrix, i.e., it satisﬁes UUT = I, and hence also UT U = I, where Iis the identity matrix.The quadratic form, and hence the Gaussian density, will be constant on surfacesfor which (2.51) is constant. If all of the eigenvalues λi are positive, then thesesurfaces represent ellipsoids, with their centres at µ and their axes oriented along ui ,1 /2and with scaling factors in the directions of the axes given by λi , as illustrated inFigure 2.7.For the Gaussian distribution to be well deﬁned, it is necessary for all of theeigenvalues λi of the covariance matrix to be strictly positive, otherwise the distribution cannot be properly normalized.

A matrix whose eigenvalues are strictlypositive is said to be positive deﬁnite. In Chapter 12, we will encounter Gaussiandistributions for which one or more of the eigenvalues are zero, in which case thedistribution is singular and is conﬁned to a subspace of lower dimensionality. If allof the eigenvalues are nonnegative, then the covariance matrix is said to be positivesemideﬁnite.Now consider the form of the Gaussian distribution in the new coordinate systemdeﬁned by the yi .

In going from the x to the y coordinate system, we have a Jacobianmatrix J with elements given byJij =∂xi= Uji∂yj(2.53)where Uji are the elements of the matrix UT . Using the orthonormality property ofthe matrix U, we see that the square of the determinant of the Jacobian matrix is 2 |J|2 = UT = UT |U| = UT U = |I| = 1(2.54)and hence |J| = 1.

Also, the determinant |Σ| of the covariance matrix can be written822. PROBABILITY DISTRIBUTIONSas the product of its eigenvalues, and hence1 /2|Σ|=D1 /2λj .(2.55)j =1Thus in the yj coordinate system, the Gaussian distribution takes the formp(y) = p(x)|J| =Dj =1yj21exp −2λj(2πλj )1/2(2.56)which is the product of D independent univariate Gaussian distributions. The eigenvectors therefore deﬁne a new set of shifted and rotated coordinates with respectto which the joint probability distribution factorizes into a product of independentdistributions.

The integral of the distribution in the y coordinate system is thenp(y) dy =D j =1∞−∞yj21exp−dyj = 12λj(2πλj )1/2(2.57)where we have used the result (1.48) for the normalization of the univariate Gaussian.This conﬁrms that the multivariate Gaussian (2.43) is indeed normalized.We now look at the moments of the Gaussian distribution and thereby provide aninterpretation of the parameters µ and Σ. The expectation of x under the Gaussiandistribution is given by111T −1exp − (x − µ) Σ (x − µ) x dxE[x] =2(2π)D/2 |Σ|1/211 T −11=exp − z Σ z (z + µ) dz(2.58)2(2π)D/2 |Σ|1/2where we have changed variables using z = x − µ.

We now note that the exponentis an even function of the components of z and, because the integrals over these aretaken over the range (−∞, ∞), the term in z in the factor (z + µ) will vanish bysymmetry. ThusE[x] = µ(2.59)and so we refer to µ as the mean of the Gaussian distribution.We now consider second order moments of the Gaussian. In the univariate case,we considered the second order moment given by E[x2 ]. For the multivariate Gaussian, there are D2 second order moments given by E[xi xj ], which we can grouptogether to form the matrix E[xxT ]. This matrix can be written as111TT −1E[xx ] =exp − (x − µ) Σ (x − µ) xxT dx2(2π)D/2 |Σ|1/2111 T −1=exp − z Σ z (z + µ)(z + µ)T dz2(2π)D/2 |Σ|1/22.3. The Gaussian Distribution83where again we have changed variables using z = x − µ.

Note that the cross-termsinvolving µzT and µT z will again vanish by symmetry. The term µµT is constantand can be taken outside the integral, which itself is unity because the Gaussiandistribution is normalized. Consider the term involving zzT . Again, we can makeuse of the eigenvector expansion of the covariance matrix given by (2.45), togetherwith the completeness of the set of eigenvectors, to writez=Dy j uj(2.60)j =1where yj = uTj z, which gives1 T −1exp − z Σ z zzT dz2 DDD y21 1kui uTexp −yi yj dyj2λk(2π)D/2 |Σ|1/2i=1 j =111D/2(2π)|Σ|1/2=k=1=Dui uTi λi = Σ(2.61)i=1where we have made use of the eigenvector equation (2.45), together with the factthat the integral on the right-hand side of the middle line vanishes by symmetryunless i = j, and in the ﬁnal line we have made use of the results (1.50) and (2.55),together with (2.48).

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.