Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 38

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 38 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 382020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 38)

Speciﬁcally, we consider a zero-meanisotropic Gaussian governed by a single precision parameter α so thatp(w|α) = N (w|0, α−1 I)(3.52)and the corresponding posterior distribution over w is then given by (3.49) withmN1S−N= βSN ΦT t= αI + βΦT Φ.(3.53)(3.54)The log of the posterior distribution is given by the sum of the log likelihood andthe log of the prior and, as a function of w, takes the formln p(w|t) = −Nαβ{tn − wT φ(xn )}2 − wT w + const.22(3.55)n=1Maximization of this posterior distribution with respect to w is therefore equivalent to the minimization of the sum-of-squares error function with the addition of aquadratic regularization term, corresponding to (3.27) with λ = α/β.We can illustrate Bayesian learning in a linear basis function model, as well asthe sequential update of a posterior distribution, using a simple example involvingstraight-line ﬁtting.

Consider a single input variable x, a single target variable t and1543. LINEAR MODELS FOR REGRESSIONa linear model of the form y(x, w) = w0 + w1 x. Because this has just two adaptive parameters, we can plot the prior and posterior distributions directly in parameterspace. We generate synthetic data from the function f (x, a) = a0 + a1 x with parameter values a0 = −0.3 and a1 = 0.5 by ﬁrst choosing values of xn from the uniformdistribution U(x|−1, 1), then evaluating f (xn , a), and ﬁnally adding Gaussian noisewith standard deviation of 0.2 to obtain the target values tn . Our goal is to recoverthe values of a0 and a1 from such data, and we will explore the dependence on thesize of the data set.

We assume here that the noise variance is known and hence weset the precision parameter to its true value β = (1/0.2)2 = 25. Similarly, we ﬁxthe parameter α to 2.0. We shall shortly discuss strategies for determining α andβ from the training data. Figure 3.7 shows the results of Bayesian learning in thismodel as the size of the data set is increased and demonstrates the sequential natureof Bayesian learning in which the current posterior distribution forms the prior whena new data point is observed. It is worth taking time to study this ﬁgure in detail asit illustrates several important aspects of Bayesian inference.

The ﬁrst row of thisﬁgure corresponds to the situation before any data points are observed and shows aplot of the prior distribution in w space together with six samples of the functiony(x, w) in which the values of w are drawn from the prior. In the second row, wesee the situation after observing a single data point. The location (x, t) of the datapoint is shown by a blue circle in the right-hand column. In the left-hand column is aplot of the likelihood function p(t|x, w) for this data point as a function of w.

Notethat the likelihood function provides a soft constraint that the line must pass close tothe data point, where close is determined by the noise precision β. For comparison,the true parameter values a0 = −0.3 and a1 = 0.5 used to generate the data setare shown by a white cross in the plots in the left column of Figure 3.7. When wemultiply this likelihood function by the prior from the top row, and normalize, weobtain the posterior distribution shown in the middle plot on the second row. Samples of the regression function y(x, w) obtained by drawing samples of w from thisposterior distribution are shown in the right-hand plot. Note that these sample linesall pass close to the data point. The third row of this ﬁgure shows the effect of observing a second data point, again shown by a blue circle in the plot in the right-handcolumn.

The corresponding likelihood function for this second data point alone isshown in the left plot. When we multiply this likelihood function by the posteriordistribution from the second row, we obtain the posterior distribution shown in themiddle plot of the third row. Note that this is exactly the same posterior distributionas would be obtained by combining the original prior with the likelihood functionfor the two data points. This posterior has now been inﬂuenced by two data points,and because two points are sufﬁcient to deﬁne a line this already gives a relativelycompact posterior distribution.

Samples from this posterior distribution give rise tothe functions shown in red in the third column, and we see that these functions passclose to both of the data points. The fourth row shows the effect of observing a totalof 20 data points. The left-hand plot shows the likelihood function for the 20th datapoint alone, and the middle plot shows the resulting posterior distribution that hasnow absorbed information from all 20 observations. Note how the posterior is muchsharper than in the third row.

In the limit of an inﬁnite number of data points, the3.3. Bayesian Linear RegressionFigure 3.7155Illustration of sequential Bayesian learning for a simple linear model of the form y(x, w) =w0 + w1 x. A detailed description of this ﬁgure is given in the text.1563. LINEAR MODELS FOR REGRESSIONposterior distribution would become a delta function centred on the true parametervalues, shown by the white cross.Other forms of prior over the parameters can be considered. For instance, wecan generalize the Gaussian prior to give 1/qMMq α1αp(w|α) =exp −|wj |q(3.56)2 2Γ(1/q)2j =1in which q = 2 corresponds to the Gaussian distribution, and only in this case is theprior conjugate to the likelihood function (3.10).

Finding the maximum of the posterior distribution over w corresponds to minimization of the regularized error function(3.29). In the case of the Gaussian prior, the mode of the posterior distribution wasequal to the mean, although this will no longer hold if q = 2.3.3.2 Predictive distributionIn practice, we are not usually interested in the value of w itself but rather inmaking predictions of t for new values of x. This requires that we evaluate thepredictive distribution deﬁned by(3.57)p(t|t, α, β) = p(t|w, β)p(w|t, α, β) dwExercise 3.10in which t is the vector of target values from the training set, and we have omitted thecorresponding input vectors from the right-hand side of the conditioning statementsto simplify the notation.

The conditional distribution p(t|x, w, β) of the target variable is given by (3.8), and the posterior weight distribution is given by (3.49). Wesee that (3.57) involves the convolution of two Gaussian distributions, and so makinguse of the result (2.115) from Section 8.1.4, we see that the predictive distributiontakes the form2(3.58)p(t|x, t, α, β) = N (t|mTN φ(x), σN (x))2where the variance σN(x) of the predictive distribution is given by2(x) =σNExercise 3.111+ φ(x)T SN φ(x).β(3.59)The ﬁrst term in (3.59) represents the noise on the data whereas the second termreﬂects the uncertainty associated with the parameters w.

Because the noise processand the distribution of w are independent Gaussians, their variances are additive.Note that, as additional data points are observed, the posterior distribution becomes2narrower. As a consequence it can be shown (Qazaz et al., 1997) that σN+1 (x) 2σN (x). In the limit N → ∞, the second term in (3.59) goes to zero, and the varianceof the predictive distribution arises solely from the additive noise governed by theparameter β.As an illustration of the predictive distribution for Bayesian linear regressionmodels, let us return to the synthetic sinusoidal data set of Section 1.1.

In Figure 3.8,1573.3. Bayesian Linear Regression11tt00−1−10x110x10x11tt00−1−10x1Figure 3.8 Examples of the predictive distribution (3.58) for a model consisting of 9 Gaussian basis functionsof the form (3.4) using the synthetic sinusoidal data set of Section 1.1. See the text for a detailed discussion.we ﬁt a model comprising a linear combination of Gaussian basis functions to datasets of various sizes and then look at the corresponding posterior distributions.

Herethe green curves correspond to the function sin(2πx) from which the data pointswere generated (with the addition of Gaussian noise). Data sets of size N = 1,N = 2, N = 4, and N = 25 are shown in the four plots by the blue circles. Foreach plot, the red curve shows the mean of the corresponding Gaussian predictivedistribution, and the red shaded region spans one standard deviation either side ofthe mean.

Note that the predictive uncertainty depends on x and is smallest in theneighbourhood of the data points. Also note that the level of uncertainty decreasesas more data points are observed.The plots in Figure 3.8 only show the point-wise predictive variance as a function of x. In order to gain insight into the covariance between the predictions atdifferent values of x, we can draw samples from the posterior distribution over w,and then plot the corresponding functions y(x, w), as shown in Figure 3.9.1583. LINEAR MODELS FOR REGRESSION11tt00−1−10x110x10x11tt00−1−10x1Figure 3.9 Plots of the function y(x, w) using samples from the posterior distributions over w corresponding tothe plots in Figure 3.8.Section 6.4Exercise 3.12Exercise 3.13If we used localized basis functions such as Gaussians, then in regions awayfrom the basis function centres, the contribution from the second term in the predictive variance (3.59) will go to zero, leaving only the noise contribution β −1 . Thus,the model becomes very conﬁdent in its predictions when extrapolating outside theregion occupied by the basis functions, which is generally an undesirable behaviour.This problem can be avoided by adopting an alternative Bayesian approach to regression known as a Gaussian process.Note that, if both w and β are treated as unknown, then we can introduce aconjugate prior distribution p(w, β) that, from the discussion in Section 2.3.6, willbe given by a Gaussian-gamma distribution (Denison et al., 2002).

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.