Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 23

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 23 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 232020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 23)

Thus we haveE[xxT ] = µµT + Σ.(2.62)For single random variables, we subtracted the mean before taking second moments in order to deﬁne a variance. Similarly, in the multivariate case it is againconvenient to subtract off the mean, giving rise to the covariance of a random vectorx deﬁned by(2.63)cov[x] = E (x − E[x])(x − E[x])T .For the speciﬁc case of a Gaussian distribution, we can make use of E[x] = µ,together with the result (2.62), to givecov[x] = Σ.Exercise 2.21(2.64)Because the parameter matrix Σ governs the covariance of x under the Gaussiandistribution, it is called the covariance matrix.Although the Gaussian distribution (2.43) is widely used as a density model, itsuffers from some signiﬁcant limitations.

Consider the number of free parameters inthe distribution. A general symmetric covariance matrix Σ will have D(D + 1)/2independent parameters, and there are another D independent parameters in µ, giving D(D + 3)/2 parameters in total. For large D, the total number of parameters842. PROBABILITY DISTRIBUTIONSFigure 2.8 Contours of constant x2probability density for a Gaussiandistribution in two dimensions inwhich the covariance matrix is (a) ofgeneral form, (b) diagonal, in whichthe elliptical contours are alignedwith the coordinate axes, and (c)proportional to the identity matrix, inwhich the contours are concentriccircles.Section 8.3Section 13.3x2x2x1(a)x1(b)x1(c)therefore grows quadratically with D, and the computational task of manipulatingand inverting large matrices can become prohibitive.

One way to address this problem is to use restricted forms of the covariance matrix. If we consider covariancematrices that are diagonal, so that Σ = diag(σi2 ), we then have a total of 2D independent parameters in the density model. The corresponding contours of constantdensity are given by axis-aligned ellipsoids. We could further restrict the covariancematrix to be proportional to the identity matrix, Σ = σ 2 I, known as an isotropic covariance, giving D + 1 independent parameters in the model and spherical surfacesof constant density.

The three possibilities of general, diagonal, and isotropic covariance matrices are illustrated in Figure 2.8. Unfortunately, whereas such approacheslimit the number of degrees of freedom in the distribution and make inversion of thecovariance matrix a much faster operation, they also greatly restrict the form of theprobability density and limit its ability to capture interesting correlations in the data.A further limitation of the Gaussian distribution is that it is intrinsically unimodal (i.e., has a single maximum) and so is unable to provide a good approximationto multimodal distributions.

Thus the Gaussian distribution can be both too ﬂexible,in the sense of having too many parameters, while also being too limited in the rangeof distributions that it can adequately represent. We will see later that the introduction of latent variables, also called hidden variables or unobserved variables, allowsboth of these problems to be addressed. In particular, a rich family of multimodaldistributions is obtained by introducing discrete latent variables leading to mixturesof Gaussians, as discussed in Section 2.3.9. Similarly, the introduction of continuouslatent variables, as described in Chapter 12, leads to models in which the number offree parameters can be controlled independently of the dimensionality D of the dataspace while still allowing the model to capture the dominant correlations in the dataset. Indeed, these two approaches can be combined and further extended to derivea very rich set of hierarchical models that can be adapted to a broad range of practical applications.

For instance, the Gaussian version of the Markov random ﬁeld,which is widely used as a probabilistic model of images, is a Gaussian distributionover the joint space of pixel intensities but rendered tractable through the impositionof considerable structure reﬂecting the spatial organization of the pixels.

Similarly,the linear dynamical system, used to model time series data for applications suchas tracking, is also a joint Gaussian distribution over a potentially large number ofobserved and latent variables and again is tractable due to the structure imposed onthe distribution. A powerful framework for expressing the form and properties of2.3. The Gaussian Distribution85such complex distributions is that of probabilistic graphical models, which will formthe subject of Chapter 8.2.3.1 Conditional Gaussian distributionsAn important property of the multivariate Gaussian distribution is that if twosets of variables are jointly Gaussian, then the conditional distribution of one setconditioned on the other is again Gaussian. Similarly, the marginal distribution ofeither set is also Gaussian.Consider ﬁrst the case of conditional distributions.

Suppose x is a D-dimensionalvector with Gaussian distribution N (x|µ, Σ) and that we partition x into two disjoint subsets xa and xb . Without loss of generality, we can take xa to form the ﬁrstM components of x, with xb comprising the remaining D − M components, so that xax=.(2.65)xbWe also deﬁne corresponding partitions of the mean vector µ given by µaµ=µband of the covariance matrix Σ given byΣaa ΣabΣ=.Σba Σbb(2.66)(2.67)Note that the symmetry ΣT = Σ of the covariance matrix implies that Σaa and Σbbare symmetric, while Σba = ΣTab .In many situations, it will be convenient to work with the inverse of the covariance matrix(2.68)Λ ≡ Σ−1which is known as the precision matrix. In fact, we shall see that some propertiesof Gaussian distributions are most naturally expressed in terms of the covariance,whereas others take a simpler form when viewed in terms of the precision.

Wetherefore also introduce the partitioned form of the precision matrixΛaa Λab(2.69)Λ=Λba ΛbbExercise 2.22corresponding to the partitioning (2.65) of the vector x. Because the inverse of asymmetric matrix is also symmetric, we see that Λaa and Λbb are symmetric, whileΛTab = Λba . It should be stressed at this point that, for instance, Λaa is not simplygiven by the inverse of Σaa .

In fact, we shall shortly examine the relation betweenthe inverse of a partitioned matrix and the inverses of its partitions.Let us begin by ﬁnding an expression for the conditional distribution p(xa |xb ).From the product rule of probability, we see that this conditional distribution can be862. PROBABILITY DISTRIBUTIONSevaluated from the joint distribution p(x) = p(xa , xb ) simply by ﬁxing xb to theobserved value and normalizing the resulting expression to obtain a valid probabilitydistribution over xa . Instead of performing this normalization explicitly, we canobtain the solution more efﬁciently by considering the quadratic form in the exponentof the Gaussian distribution given by (2.44) and then reinstating the normalizationcoefﬁcient at the end of the calculation.

If we make use of the partitioning (2.65),(2.66), and (2.69), we obtain1− (x − µ)T Σ−1 (x − µ) =211− (xa − µa )T Λaa (xa − µa ) − (xa − µa )T Λab (xb − µb )2211T− (xb − µb ) Λba (xa − µa ) − (xb − µb )T Λbb (xb − µb ). (2.70)22We see that as a function of xa , this is again a quadratic form, and hence the corresponding conditional distribution p(xa |xb ) will be Gaussian. Because this distribution is completely characterized by its mean and its covariance, our goal will beto identify expressions for the mean and covariance of p(xa |xb ) by inspection of(2.70).This is an example of a rather common operation associated with Gaussiandistributions, sometimes called ‘completing the square’, in which we are given aquadratic form deﬁning the exponent terms in a Gaussian distribution, and we needto determine the corresponding mean and covariance.

Such problems can be solvedstraightforwardly by noting that the exponent in a general Gaussian distributionN (x|µ, Σ) can be written11− (x − µ)T Σ−1 (x − µ) = − xT Σ−1 x + xT Σ−1 µ + const22(2.71)where ‘const’ denotes terms which are independent of x, and we have made use ofthe symmetry of Σ. Thus if we take our general quadratic form and express it inthe form given by the right-hand side of (2.71), then we can immediately equate thematrix of coefﬁcients entering the second order term in x to the inverse covariancematrix Σ−1 and the coefﬁcient of the linear term in x to Σ−1 µ, from which we canobtain µ.Now let us apply this procedure to the conditional Gaussian distribution p(xa |xb )for which the quadratic form in the exponent is given by (2.70). We will denote themean and covariance of this distribution by µa|b and Σa|b , respectively. Considerthe functional dependence of (2.70) on xa in which xb is regarded as a constant.

Ifwe pick out all terms that are second order in xa , we have1− xTΛaa xa2 a(2.72)from which we can immediately conclude that the covariance (inverse precision) ofp(xa |xb ) is given by1Σa|b = Λ−(2.73)aa .2.3. The Gaussian Distribution87Now consider all of the terms in (2.70) that are linear in xaxTa {Λaa µa − Λab (xb − µb )}(2.74)where we have used ΛTba = Λab . From our discussion of the general form (2.71),1the coefﬁcient of xa in this expression must equal Σ−a|b µa|b and henceµa|b= Σa|b {Λaa µa − Λab (xb − µb )}1= µa − Λ −aa Λab (xb − µb )Exercise 2.24(2.75)where we have made use of (2.73).The results (2.73) and (2.75) are expressed in terms of the partitioned precisionmatrix of the original joint distribution p(xa , xb ).

We can also express these resultsin terms of the corresponding partitioned covariance matrix. To do this, we make useof the following identity for the inverse of a partitioned matrix−1 A BM−MBD−1=(2.76)C D−D−1 CM D−1 + D−1 CMBD−1where we have deﬁnedM = (A − BD−1 C)−1 .(2.77)−1The quantity M is known as the Schur complement of the matrix on the left-handside of (2.76) with respect to the submatrix D. Using the deﬁnition−1 Σaa ΣabΛaa Λab=(2.78)Σba ΣbbΛba Λbband making use of (2.76), we have1−1Λaa = (Σaa − Σab Σ−bb Σba )(2.79)Λab = −(Σaa −(2.80)11−1Σab Σ−Σab Σ−bb Σba )bb .From these we obtain the following expressions for the mean and covariance of theconditional distribution p(xa |xb )Section 8.1.4µa|b1= µa + Σab Σ−bb (xb − µb )(2.81)Σa|b1= Σaa − Σab Σ−bb Σba .(2.82)Comparing (2.73) and (2.82), we see that the conditional distribution p(xa |xb ) takesa simpler form when expressed in terms of the partitioned precision matrix thanwhen it is expressed in terms of the partitioned covariance matrix.

Note that themean of the conditional distribution p(xa |xb ), given by (2.81), is a linear function ofxb and that the covariance, given by (2.82), is independent of xa . This represents anexample of a linear-Gaussian model.882. PROBABILITY DISTRIBUTIONS2.3.2 Marginal Gaussian distributionsWe have seen that if a joint distribution p(xa , xb ) is Gaussian, then the conditional distribution p(xa |xb ) will again be Gaussian. Now we turn to a discussion ofthe marginal distribution given by(2.83)p(xa ) = p(xa , xb ) dxbwhich, as we shall see, is also Gaussian.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.