Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 28

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 28 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 282020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 28)

The simplest approach is to use a histogram ofobservations in which the angular coordinate is divided into ﬁxed bins. This has thevirtue of simplicity and ﬂexibility but also suffers from signiﬁcant limitations, as weshall see when we discuss histogram methods in more detail in Section 2.5. Anotherapproach starts, like the von Mises distribution, from a Gaussian distribution over aEuclidean space but now marginalizes onto the unit circle rather than conditioning(Mardia and Jupp, 2000). However, this leads to more complex forms of distributionand will not be discussed further.

Finally, any valid distribution over the real axis(such as a Gaussian) can be turned into a periodic distribution by mapping successive intervals of width 2π onto the periodic variable (0, 2π), which corresponds to‘wrapping’ the real axis around unit circle. Again, the resulting distribution is morecomplex to handle than the von Mises distribution.One limitation of the von Mises distribution is that it is unimodal. By formingmixtures of von Mises distributions, we obtain a ﬂexible framework for modellingperiodic variables that can handle multimodality. For an example of a machine learning application that makes use of von Mises distributions, see Lawrence et al.

(2002),and for extensions to modelling conditional densities for regression problems, seeBishop and Nabney (1996).2.3.9 Mixtures of GaussiansAppendix AWhile the Gaussian distribution has some important analytical properties, it suffers from signiﬁcant limitations when it comes to modelling real data sets.

Considerthe example shown in Figure 2.21. This is known as the ‘Old Faithful’ data set,and comprises 272 measurements of the eruption of the Old Faithful geyser at Yellowstone National Park in the USA. Each measurement comprises the duration of2.3. The Gaussian DistributionFigure 2.22111Example of a Gaussian mixture distribution p(x)in one dimension showing three Gaussians(each scaled by a coefﬁcient) in blue andtheir sum in red.xthe eruption in minutes (horizontal axis) and the time in minutes to the next eruption (vertical axis).

We see that the data set forms two dominant clumps, and thata simple Gaussian distribution is unable to capture this structure, whereas a linearsuperposition of two Gaussians gives a better characterization of the data set.Such superpositions, formed by taking linear combinations of more basic distributions such as Gaussians, can be formulated as probabilistic models known asmixture distributions (McLachlan and Basford, 1988; McLachlan and Peel, 2000).In Figure 2.22 we see that a linear combination of Gaussians can give rise to verycomplex densities. By using a sufﬁcient number of Gaussians, and by adjusting theirmeans and covariances as well as the coefﬁcients in the linear combination, almostany continuous density can be approximated to arbitrary accuracy.We therefore consider a superposition of K Gaussian densities of the formp(x) =Kπk N (x|µk , Σk )(2.188)k=1Section 9.3.3which is called a mixture of Gaussians. Each Gaussian density N (x|µk , Σk ) iscalled a component of the mixture and has its own mean µk and covariance Σk .Contour and surface plots for a Gaussian mixture having 3 components are shown inFigure 2.23.In this section we shall consider Gaussian components to illustrate the framework of mixture models.

More generally, mixture models can comprise linear combinations of other distributions. For instance, in Section 9.3.3 we shall considermixtures of Bernoulli distributions as an example of a mixture model for discretevariables.The parameters πk in (2.188) are called mixing coefﬁcients. If we integrate bothsides of (2.188) with respect to x, and note that both p(x) and the individual Gaussiancomponents are normalized, we obtainKπk = 1.(2.189)k=1Also, the requirement that p(x) 0, together with N (x|µk , Σk ) 0, impliesπk 0 for all k. Combining this with the condition (2.189) we obtain0 πk 1.(2.190)11212. PROBABILITY DISTRIBUTIONS1(a)0.50.2(b)0.50.30.50000.5100.51Figure 2.23 Illustration of a mixture of 3 Gaussians in a two-dimensional space.

(a) Contours of constantdensity for each of the mixture components, in which the 3 components are denoted red, blue and green, andthe values of the mixing coefﬁcients are shown below each component. (b) Contours of the marginal probabilitydensity p(x) of the mixture distribution. (c) A surface plot of the distribution p(x).We therefore see that the mixing coefﬁcients satisfy the requirements to be probabilities.From the sum and product rules, the marginal density is given byp(x) =Kp(k)p(x|k)(2.191)k=1which is equivalent to (2.188) in which we can view πk = p(k) as the prior probability of picking the k th component, and the density N (x|µk , Σk ) = p(x|k) asthe probability of x conditioned on k. As we shall see in later chapters, an important role is played by the posterior probabilities p(k|x), which are also known asresponsibilities.

From Bayes’ theorem these are given byγk (x) ≡ p(k|x)p(k)p(x|k)= l p(l)p(x|l)πk N (x|µk , Σk ).= l πl N (x|µl , Σl )(2.192)We shall discuss the probabilistic interpretation of the mixture distribution in greaterdetail in Chapter 9.The form of the Gaussian mixture distribution is governed by the parameters π,µ and Σ, where we have used the notation π ≡ {π1 , . .

. , πK }, µ ≡ {µ1 , . . . , µK }and Σ ≡ {Σ1 , . . . ΣK }. One way to set the values of these parameters is to usemaximum likelihood. From (2.188) the log of the likelihood function is given byKNln p(X|π, µ, Σ) =lnπk N (xn |µk , Σk )(2.193)n=1k=12.4. The Exponential Family113where X = {x1 , . . . , xN }. We immediately see that the situation is now muchmore complex than with a single Gaussian, due to the presence of the summationover k inside the logarithm. As a result, the maximum likelihood solution for theparameters no longer has a closed-form analytical solution. One approach to maximizing the likelihood function is to use iterative numerical optimization techniques(Fletcher, 1987; Nocedal and Wright, 1999; Bishop and Nabney, 2008).

Alternatively we can employ a powerful framework called expectation maximization, whichwill be discussed at length in Chapter 9.2.4. The Exponential FamilyThe probability distributions that we have studied so far in this chapter (with theexception of the Gaussian mixture) are speciﬁc examples of a broad class of distributions called the exponential family (Duda and Hart, 1973; Bernardo and Smith,1994). Members of the exponential family have many important properties in common, and it is illuminating to discuss these properties in some generality.The exponential family of distributions over x, given parameters η, is deﬁned tobe the set of distributions of the form(2.194)p(x|η) = h(x)g(η) exp η T u(x)where x may be scalar or vector, and may be discrete or continuous. Here η arecalled the natural parameters of the distribution, and u(x) is some function of x.The function g(η) can be interpreted as the coefﬁcient that ensures that the distribution is normalized and therefore satisﬁes(2.195)g(η) h(x) exp η T u(x) dx = 1where the integration is replaced by summation if x is a discrete variable.We begin by taking some examples of the distributions introduced earlier inthe chapter and showing that they are indeed members of the exponential family.Consider ﬁrst the Bernoulli distributionp(x|µ) = Bern(x|µ) = µx (1 − µ)1−x .(2.196)Expressing the right-hand side as the exponential of the logarithm, we havep(x|µ) = exp {x ln µ + (1 − x) ln(1 − µ)} µ= (1 − µ) exp lnx .1−µComparison with (2.194) allows us to identifyµη = ln1−µ(2.197)(2.198)1142.

PROBABILITY DISTRIBUTIONSwhich we can solve for µ to give µ = σ(η), whereσ(η) =11 + exp(−η)(2.199)is called the logistic sigmoid function. Thus we can write the Bernoulli distributionusing the standard representation (2.194) in the formp(x|η) = σ(−η) exp(ηx)(2.200)where we have used 1 − σ(η) = σ(−η), which is easily proved from (2.199). Comparison with (2.194) shows thatu(x) = xh(x) = 1g(η) = σ(−η).(2.201)(2.202)(2.203)Next consider the multinomial distribution that, for a single observation x, takesthe formMMp(x|µ) =µxkk = expxk ln µk(2.204)k=1k=1where x = (x1 , . .

. , xN ) . Again, we can write this in the standard representation(2.194) so that(2.205)p(x|η) = exp(η T x)Twhere ηk = ln µk , and we have deﬁned η = (η1 , . . . , ηM )T . Again, comparing with(2.194) we haveu(x) = xh(x) = 1g(η) = 1.(2.206)(2.207)(2.208)Note that the parameters ηk are not independent because the parameters µk are subject to the constraintMµk = 1(2.209)k=1so that, given any M − 1 of the parameters µk , the value of the remaining parameteris ﬁxed.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.