Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 52

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 52 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 522020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 52)

In particular, evaluation of the posteriordistribution would require normalization of the product of a prior distribution and alikelihood function that itself comprises a product of logistic sigmoid functions, onefor every data point. Evaluation of the predictive distribution is similarly intractable.Here we consider the application of the Laplace approximation to the problem ofBayesian logistic regression (Spiegelhalter and Lauritzen, 1990; MacKay, 1992b).4.5.1 Laplace approximationRecall from Section 4.4 that the Laplace approximation is obtained by ﬁndingthe mode of the posterior distribution and then ﬁtting a Gaussian centred at thatmode.

This requires evaluation of the second derivatives of the log posterior, whichis equivalent to ﬁnding the Hessian matrix.Because we seek a Gaussian representation for the posterior distribution, it isnatural to begin with a Gaussian prior, which we write in the general formp(w) = N (w|m0 , S0 )(4.140)2184. LINEAR MODELS FOR CLASSIFICATIONwhere m0 and S0 are ﬁxed hyperparameters.

The posterior distribution over w isgiven byp(w|t) ∝ p(w)p(t|w)(4.141)where t = (t1 , . . . , tN )T . Taking the log of both sides, and substituting for the priordistribution using (4.140), and for the likelihood function using (4.89), we obtain11ln p(w|t) = − (w − m0 )T S−0 (w − m0 )2N+{tn ln yn + (1 − tn ) ln(1 − yn )} + const (4.142)n=1where yn = σ(wT φn ). To obtain a Gaussian approximation to the posterior distribution, we ﬁrst maximize the posterior distribution to give the MAP (maximumposterior) solution wMAP , which deﬁnes the mean of the Gaussian. The covarianceis then given by the inverse of the matrix of second derivatives of the negative loglikelihood, which takes the form1SN = −∇∇ ln p(w|t) = S−0 +Nyn (1 − yn )φn φTn.(4.143)n=1The Gaussian approximation to the posterior distribution therefore takes the formq(w) = N (w|wMAP , SN ).(4.144)Having obtained a Gaussian approximation to the posterior distribution, thereremains the task of marginalizing with respect to this distribution in order to makepredictions.4.5.2 Predictive distributionThe predictive distribution for class C1 , given a new feature vector φ(x), isobtained by marginalizing with respect to the posterior distribution p(w|t), which isitself approximated by a Gaussian distribution q(w) so that(4.145)p(C1 |φ, t) = p(C1 |φ, w)p(w|t) dw σ(wT φ)q(w) dwwith the corresponding probability for class C2 given by p(C2 |φ, t) = 1 − p(C1 |φ, t).To evaluate the predictive distribution, we ﬁrst note that the function σ(wT φ) depends on w only through its projection onto φ.

Denoting a = wT φ, we haveTσ(w φ) = δ(a − wT φ)σ(a) da(4.146)where δ(·) is the Dirac delta function. From this we obtainσ(wT φ)q(w) dw = σ(a)p(a) da(4.147)4.5. Bayesian Logistic Regression219wherep(a) =δ(a − wT φ)q(w) dw.(4.148)We can evaluate p(a) by noting that the delta function imposes a linear constrainton w and so forms a marginal distribution from the joint distribution q(w) by integrating out all directions orthogonal to φ. Because q(w) is Gaussian, we know fromSection 2.3.2 that the marginal distribution will also be Gaussian. We can evaluatethe mean and covariance of this distribution by taking moments, and interchangingthe order of integration over a and w, so thatTφ(4.149)µa = E[a] = p(a)a da = q(w)wT φ dw = wMAPwhere we have used the result (4.144) for the variational posterior distribution q(w).Similarly2σa = var[a] = p(a) a2 − E[a]2 da2q(w) (wT φ)2 − (mTdw = φT SN φ.(4.150)=N φ)Note that the distribution of a takes the same form as the predictive distribution(3.58) for the linear regression model, with the noise variance set to zero.

Thus ourvariational approximation to the predictive distribution becomes(4.151)p(C1 |t) = σ(a)p(a) da = σ(a)N (a|µa , σa2 ) da.Exercise 4.24Exercise 4.25Exercise 4.26This result can also be derived directly by making use of the results for the marginalof a Gaussian distribution given in Section 2.3.2.The integral over a represents the convolution of a Gaussian with a logistic sigmoid, and cannot be evaluated analytically.

We can, however, obtain a good approximation (Spiegelhalter and Lauritzen, 1990; MacKay, 1992b; Barber and Bishop,1998a) by making use of the close similarity between the logistic sigmoid functionσ(a) deﬁned by (4.59) and the probit function Φ(a) deﬁned by (4.114). In order toobtain the best approximation to the logistic function we need to re-scale the horizontal axis, so that we approximate σ(a) by Φ(λa). We can ﬁnd a suitable value ofλ by requiring that the two functions have the same slope at the origin, which givesλ2 = π/8. The similarity of the logistic sigmoid and the probit function, for thischoice of λ, is illustrated in Figure 4.9.The advantage of using a probit function is that its convolution with a Gaussiancan be expressed analytically in terms of another probit function.

Speciﬁcally wecan show thatµ2Φ(λa)N (a|µ, σ ) da = Φ.(4.152)(λ−2 + σ 2 )1/22204. LINEAR MODELS FOR CLASSIFICATIONWe now apply the approximation σ(a) Φ(λa) to the probit functions appearingon both sides of this equation, leading to the following approximation for the convolution of a logistic sigmoid with a Gaussianσ(a)N (a|µ, σ 2 ) da σ κ(σ 2 )µ(4.153)where we have deﬁnedκ(σ 2 ) = (1 + πσ 2 /8)−1/2 .(4.154)Applying this result to (4.151) we obtain the approximate predictive distributionin the form(4.155)p(C1 |φ, t) = σ κ(σa2 )µawhere µa and σa2 are deﬁned by (4.149) and (4.150), respectively, and κ(σa2 ) is deﬁned by (4.154).Note that the decision boundary corresponding to p(C1 |φ, t) = 0.5 is given byµa = 0, which is the same as the decision boundary obtained by using the MAPvalue for w. Thus if the decision criterion is based on minimizing misclassiﬁcation rate, with equal prior probabilities, then the marginalization over w has no effect.

However, for more complex decision criteria it will play an important role.Marginalization of the logistic sigmoid model under a Gaussian approximation tothe posterior distribution will be illustrated in the context of variational inference inFigure 10.13.Exercises4.1 ( ) Given a set of data points {xn }, we can deﬁne the convex hull to be the set ofall points x given byx=αn xn(4.156)nwhere αn 0 and n αn = 1.

Consider a second set of points {yn } together withtheir corresponding convex hull. By deﬁnition, the two sets of points will be linearly and a scalar w0 such that w T xn + w0 > 0 for allseparable if there exists a vector wTxn , and w yn + w0 < 0 for all yn . Show that if their convex hulls intersect, the twosets of points cannot be linearly separable, and conversely that if they are linearlyseparable, their convex hulls do not intersect.4.2 ( ) www Consider the minimization of a sum-of-squares error function (4.15),and suppose that all of the target vectors in the training set satisfy a linear constraintaT t n + b = 0(4.157)where tn corresponds to the nth row of the matrix T in (4.15). Show that as aconsequence of this constraint, the elements of the model prediction y(x) given bythe least-squares solution (4.17) also satisfy this constraint, so thataT y(x) + b = 0.(4.158)Exercises221To do so, assume that one of the basis functions φ0 (x) = 1 so that the correspondingparameter w0 plays the role of a bias.4.3 ( ) Extend the result of Exercise 4.2 to show that if multiple linear constraintsare satisﬁed simultaneously by the target vectors, then the same constraints will alsobe satisﬁed by the least-squares prediction of a linear model.4.4 () www Show that maximization of the class separation criterion given by (4.23)with respect to w, using a Lagrange multiplier to enforce the constraint wT w = 1,leads to the result that w ∝ (m2 − m1 ).4.5 () By making use of (4.20), (4.23), and (4.24), show that the Fisher criterion (4.25)can be written in the form (4.26).4.6 () Using the deﬁnitions of the between-class and within-class covariance matricesgiven by (4.27) and (4.28), respectively, together with (4.34) and (4.36) and thechoice of target values described in Section 4.1.5, show that the expression (4.33)that minimizes the sum-of-squares error function can be written in the form (4.37).4.7 () www Show that the logistic sigmoid function (4.59) satisﬁes the propertyσ(−a) = 1 − σ(a) and that its inverse is given by σ −1 (y) = ln {y/(1 − y)}.4.8 () Using (4.57) and (4.58), derive the result (4.65) for the posterior class probabilityin the two-class generative model with Gaussian densities, and verify the results(4.66) and (4.67) for the parameters w and w0 .4.9 () www Consider a generative classiﬁcation model for K classes deﬁned byprior class probabilities p(Ck ) = πk and general class-conditional densities p(φ|Ck )where φ is the input feature vector.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.