Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 64

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 64 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 642020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 64)

Mixture Density NetworksExercise 5.33The goal of supervised learning is to model a conditional distribution p(t|x), whichfor many simple regression problems is chosen to be Gaussian. However, practicalmachine learning problems can often have signiﬁcantly non-Gaussian distributions.These can arise, for example, with inverse problems in which the distribution can bemultimodal, in which case the Gaussian assumption can lead to very poor predictions.As a simple example of an inverse problem, consider the kinematics of a robotarm, as illustrated in Figure 5.18. The forward problem involves ﬁnding the end effector position given the joint angles and has a unique solution. However, in practicewe wish to move the end effector of the robot to a speciﬁc position, and to do this wemust set appropriate joint angles. We therefore need to solve the inverse problem,which has two solutions as seen in Figure 5.18.Forward problems often corresponds to causality in a physical system and generally have a unique solution.

For instance, a speciﬁc pattern of symptoms in thehuman body may be caused by the presence of a particular disease. In pattern recognition, however, we typically have to solve an inverse problem, such as trying topredict the presence of a disease given a set of symptoms. If the forward probleminvolves a many-to-one mapping, then the inverse problem will have multiple solutions. For instance, several different diseases may result in the same symptoms.In the robotics example, the kinematics is deﬁned by geometrical equations, andthe multimodality is readily apparent.

However, in many machine learning problemsthe presence of multimodality, particularly in problems involving spaces of high dimensionality, can be less obvious. For tutorial purposes, however, we shall considera simple toy problem for which we can easily visualize the multimodality. Data forthis problem is generated by sampling a variable x uniformly over the interval (0, 1),to give a set of values {xn }, and the corresponding target values tn are obtained5.6.

Mixture Density NetworksFigure 5.19 On the left is the dataset for a simple ‘forward problem’ inwhich the red curve shows the resultof ﬁtting a two-layer neural networkby minimizing the sum-of-squareserror function. The correspondinginverse problem, shown on the right,is obtained by exchanging the rolesof x and t. Here the same network trained again by minimizing thesum-of-squares error function givesa very poor ﬁt to the data due to themultimodality of the data set.11000102731by computing the function xn + 0.3 sin(2πxn ) and then adding uniform noise overthe interval (−0.1, 0.1). The inverse problem is then obtained by keeping the samedata points but exchanging the roles of x and t. Figure 5.19 shows the data sets forthe forward and inverse problems, along with the results of ﬁtting two-layer neuralnetworks having 6 hidden units and a single linear output unit by minimizing a sumof-squares error function.

Least squares corresponds to maximum likelihood undera Gaussian assumption. We see that this leads to a very poor model for the highlynon-Gaussian inverse problem.We therefore seek a general framework for modelling conditional probabilitydistributions. This can be achieved by using a mixture model for p(t|x) in whichboth the mixing coefﬁcients as well as the component densities are ﬂexible functionsof the input vector x, giving rise to the mixture density network. For any given valueof x, the mixture model provides a general formalism for modelling an arbitraryconditional density function p(t|x). Provided we consider a sufﬁciently ﬂexiblenetwork, we then have a framework for approximating arbitrary conditional distributions.Here we shall develop the model explicitly for Gaussian components, so thatp(t|x) =Kπk (x)N t|µk (x), σk2 (x) .(5.148)k=1This is an example of a heteroscedastic model since the noise variance on the datais a function of the input vector x.

Instead of Gaussians, we can use other distributions for the components, such as Bernoulli distributions if the target variables arebinary rather than continuous. We have also specialized to the case of isotropic covariances for the components, although the mixture density network can readily beextended to allow for general covariance matrices by representing the covariancesusing a Cholesky factorization (Williams, 1996). Even with isotropic components,the conditional distribution p(t|x) does not assume factorization with respect to thecomponents of t (in contrast to the standard sum-of-squares regression model) as aconsequence of the mixture distribution.We now take the various parameters of the mixture model, namely the mixingcoefﬁcients πk (x), the means µk (x), and the variances σk2 (x), to be governed by2745.

NEURAL NETWORKSp(t|x)xDθMθx1θ1tFigure 5.20The mixture density network can represent general conditional probability densities p(t|x)by considering a parametric mixture model for the distribution of t whose parameters aredetermined by the outputs of a neural network that takes x as its input vector.the outputs of a conventional neural network that takes x as its input. The structureof this mixture density network is illustrated in Figure 5.20. The mixture densitynetwork is closely related to the mixture of experts discussed in Section 14.5.3.

Theprinciple difference is that in the mixture density network the same function is usedto predict the parameters of all of the component densities as well as the mixing coefﬁcients, and so the nonlinear hidden units are shared amongst the input-dependentfunctions.The neural network in Figure 5.20 can, for example, be a two-layer networkhaving sigmoidal (‘tanh’) hidden units. If there are L components in the mixturemodel (5.148), and if t has K components, then the network will have L output unitactivations denoted by aπk that determine the mixing coefﬁcients πk (x), K outputsdenoted by aσk that determine the kernel widths σk (x), and L × K outputs denotedby aµkj that determine the components µkj (x) of the kernel centres µk (x).

The totalnumber of network outputs is given by (K + 2)L, as compared with the usual Koutputs for a network, which simply predicts the conditional means of the targetvariables.The mixing coefﬁcients must satisfy the constraintsKπk (x) = 1,0 πk (x) 1(5.149)k=1which can be achieved using a set of softmax outputsexp(aπk ).πk (x) = Kπl=1 exp(al )(5.150)Similarly, the variances must satisfy σk2 (x) 0 and so can be represented in termsof the exponentials of the corresponding network activations usingσk (x) = exp(aσk ).(5.151)Finally, because the means µk (x) have real components, they can be represented5.6. Mixture Density Networks275directly by the network output activationsµkj (x) = aµkj .(5.152)The adaptive parameters of the mixture density network comprise the vector wof weights and biases in the neural network, that can be set by maximum likelihood,or equivalently by minimizing an error function deﬁned to be the negative logarithmof the likelihood.

For independent data, this error function takes the form kNlnπk (xn , w)N tn |µk (xn , w), σk2 (xn , w)(5.153)E(w) = −n=1k=1where we have made the dependencies on w explicit.In order to minimize the error function, we need to calculate the derivatives ofthe error E(w) with respect to the components of w. These can be evaluated byusing the standard backpropagation procedure, provided we obtain suitable expressions for the derivatives of the error with respect to the output-unit activations. Theserepresent error signals δ for each pattern and for each output unit, and can be backpropagated to the hidden units and the error function derivatives evaluated in theusual way.

Because the error function (5.153) is composed of a sum of terms, onefor each training data point, we can consider the derivatives for a particular patternn and then ﬁnd the derivatives of E by summing over all patterns.Because we are dealing with mixture distributions, it is convenient to view themixing coefﬁcients πk (x) as x-dependent prior probabilities and to introduce thecorresponding posterior probabilities given byπk Nnkγk (t|x) = Kl=1 πl NnlExercise 5.34Exercise 5.35Exercise 5.36(5.154)where Nnk denotes N (tn |µk (xn ), σk2 (xn )).The derivatives with respect to the network output activations governing the mixing coefﬁcients are given by∂En= πk − γk .(5.155)∂aπkSimilarly, the derivatives with respect to the output activations controlling the component means are given byµkl − tl∂En=γ.(5.156)k∂aµklσk2Finally, the derivatives with respect to the output activations controlling the component variances are given byt − µk 21∂En= −γk−.(5.157)∂aσkσk3σk2765.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.