Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 81

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 81 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 812020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 81)

Ofcourse, in the case of the RVM we always have the option of starting with a smallernumber of basis functions than N + 1. More signiﬁcantly, in the relevance vectormachine the parameters governing complexity and noise variance are determinedautomatically from a single training run, whereas in the support vector machine theparameters C and (or ν) are generally found using cross-validation, which involvesmultiple training runs. Furthermore, in the next section we shall derive an alternativeprocedure for training the relevance vector machine that improves training speedsigniﬁcantly.7.2.2 Analysis of sparsityWe have noted earlier that the mechanism of automatic relevance determinationcauses a subset of parameters to be driven to zero. We now examine in more detail3507.

SPARSE KERNEL MACHINESt2t2tCCt1ϕtt1Figure 7.10 Illustration of the mechanism for sparsity in a Bayesian linear regression model, showing a trainingset vector of target values given by t = (t1 , t2 )T , indicated by the cross, for a model with one basis vectorϕ = (φ(x1 ), φ(x2 ))T , which is poorly aligned with the target data vector t. On the left we see a model havingonly isotropic noise, so that C = β −1 I, corresponding to α = ∞, with β set to its most probable value. Onthe right we see the same model but with a ﬁnite value of α. In each case the red ellipse corresponds to unitMahalanobis distance, with |C| taking the same value for both plots, while the dashed green circle shows thecontrition arising from the noise term β −1 .

We see that any ﬁnite value of α reduces the probability of theobserved data, and so for the most probable solution the basis vector is removed.the mechanism of sparsity in the context of the relevance vector machine. In theprocess, we will arrive at a signiﬁcantly faster procedure for optimizing the hyperparameters compared to the direct techniques given above.Before proceeding with a mathematical analysis, we ﬁrst give some informalinsight into the origin of sparsity in Bayesian linear models. Consider a data setcomprising N = 2 observations t1 and t2 , together with a model having a singlebasis function φ(x), with hyperparameter α, along with isotropic noise having precision β. From (7.85), the marginal likelihood is given by p(t|α, β) = N (t|0, C) inwhich the covariance matrix takes the formC=11I + ϕϕTβα(7.92)where ϕ denotes the N -dimensional vector (φ(x1 ), φ(x2 ))T , and similarly t =(t1 , t2 )T .

Notice that this is just a zero-mean Gaussian process model over t withcovariance C. Given a particular observation for t, our goal is to ﬁnd α and β bymaximizing the marginal likelihood. We see from Figure 7.10 that, if there is a pooralignment between the direction of ϕ and that of the training data vector t, then thecorresponding hyperparameter α will be driven to ∞, and the basis vector will bepruned from the model. This arises because any ﬁnite value for α will always assigna lower probability to the data, thereby decreasing the value of the density at t, provided that β is set to its optimal value. We see that any ﬁnite value for α would causethe distribution to be elongated in a direction away from the data, thereby increasingthe probability mass in regions away from the observed data and hence reducing thevalue of the density at the target data vector itself. For the more general case of M7.2.

Relevance Vector Machines351basis vectors ϕ1 , . . . , ϕM a similar intuition holds, namely that if a particular basisvector is poorly aligned with the data vector t, then it is likely to be pruned from themodel.We now investigate the mechanism for sparsity from a more mathematical perspective, for a general case involving M basis functions. To motivate this analysiswe ﬁrst note that, in the result (7.87) for re-estimating the parameter αi , the terms onthe right-hand side are themselves also functions of αi . These results therefore represent implicit solutions, and iteration would be required even to determine a singleαi with all other αj for j = i ﬁxed.This suggests a different approach to solving the optimization problem for theRVM, in which we make explicit all of the dependence of the marginal likelihood(7.85) on a particular αi and then determine its stationary points explicitly (Faul andTipping, 2002; Tipping and Faul, 2003).

To do this, we ﬁrst pull out the contributionfrom αi in the matrix C deﬁned by (7.86) to give−1TC = β −1 I +αj−1 ϕj ϕTj + αi ϕi ϕij=i= C−i + αi−1 ϕi ϕTi(7.93)where ϕi denotes the ith column of Φ, in other words the N -dimensional vector withelements (φi (x1 ), . . . , φi (xN )), in contrast to φn , which denotes the nth row of Φ.The matrix C−i represents the matrix C with the contribution from basis function iremoved. Using the matrix identities (C.7) and (C.15), the determinant and inverseof C can then be written−1|C| = |C−i ||1 + αi−1 ϕTi C−i ϕi |C−1Exercise 7.151= C−−i −1T −1C−−i ϕi ϕi C−i.−1αi + ϕTi C−i ϕi(7.94)(7.95)Using these results, we can then write the log marginal likelihood function (7.85) inthe form(7.96)L(α) = L(α−i ) + λ(αi )where L(α−i ) is simply the log marginal likelihood with basis function ϕi omitted,and the quantity λ(αi ) is deﬁned by1qi2ln αi − ln (αi + si ) +(7.97)λ(αi ) =2αi + siand contains all of the dependence on αi .

Here we have introduced the two quantitiessiqi−1= ϕTi C−i ϕi=1ϕi C−−i t.T(7.98)(7.99)Here si is called the sparsity and qi is known as the quality of ϕi , and as we shallsee, a large value of si relative to the value of qi means that the basis function ϕi3527. SPARSE KERNEL MACHINESFigure 7.11 Plots of the logmarginal likelihood λ(αi ) versusln αi showing on the left, the singlemaximum at a ﬁnite αi for qi2 = 4and si = 1 (so that qi2 > si ) and onthe right, the maximum at αi = ∞for qi2 = 1 and si = 2 (so thatqi2 < si ).2200−2−2−4−4−505−505is more likely to be pruned from the model. The ‘sparsity’ measures the extent towhich basis function ϕi overlaps with the other basis vectors in the model, and the‘quality’ represents a measure of the alignment of the basis vector ϕn with the errorbetween the training set values t = (t1 , .

. . , tN )T and the vector y−i of predictionsthat would result from the model with the vector ϕi excluded (Tipping and Faul,2003).The stationary points of the marginal likelihood with respect to αi occur whenthe derivativeα−1 s2 − (qi2 − si )dλ(αi )= i i(7.100)dαi2(αi + si )2is equal to zero. There are two possible forms for the solution. Recalling that αi 0,we see that if qi2 < si , then αi → ∞ provides a solution.

Conversely, if qi2 > si , wecan solve for αi to obtains2αi = 2 i .(7.101)qi − siExercise 7.16These two solutions are illustrated in Figure 7.11. We see that the relative size ofthe quality and sparsity terms determines whether a particular basis vector will bepruned from the model or not. A more complete analysis (Faul and Tipping, 2002),based on the second derivatives of the marginal likelihood, conﬁrms these solutionsare indeed the unique maxima of λ(αi ).Note that this approach has yielded a closed-form solution for αi , for givenvalues of the other hyperparameters.

As well as providing insight into the origin ofsparsity in the RVM, this analysis also leads to a practical algorithm for optimizingthe hyperparameters that has signiﬁcant speed advantages. This uses a ﬁxed setof candidate basis vectors, and then cycles through them in turn to decide whethereach vector should be included in the model or not. The resulting sequential sparseBayesian learning algorithm is described below.Sequential Sparse Bayesian Learning Algorithm1. If solving a regression problem, initialize β.2. Initialize using one basis function ϕ1 , with hyperparameter α1 set using(7.101), with the remaining hyperparameters αj for j = i initialized toinﬁnity, so that only ϕ1 is included in the model.7.2. Relevance Vector Machines3533.

Evaluate Σ and m, along with qi and si for all basis functions.4. Select a candidate basis function ϕi .5. If qi2 > si , and αi < ∞, so that the basis vector ϕi is already included inthe model, then update αi using (7.101).6. If qi2 > si , and αi = ∞, then add ϕi to the model, and evaluate hyperparameter αi using (7.101).7. If qi2 si , and αi < ∞ then remove basis function ϕi from the model,and set αi = ∞.8. If solving a regression problem, update β.9. If converged terminate, otherwise go to 3.Note that if qi2 si and αi = ∞, then the basis function ϕi is already excludedfrom the model and no action is required.In practice, it is convenient to evaluate the quantitiesQiSi−1= ϕTti CT −1= ϕi C ϕi .(7.102)(7.103)The quality and sparseness variables can then be expressed in the formqi =si =Exercise 7.17αi Qiαi − S iαi S i.αi − S i(7.104)(7.105)Note that when αi = ∞, we have qi = Qi and si = Si .

Using (C.7), we can writeQiSiT2 T= βϕTi t − β ϕi ΦΣΦ tT2 T= βϕTi ϕi − β ϕi ΦΣΦ ϕi(7.106)(7.107)where Φ and Σ involve only those basis vectors that correspond to ﬁnite hyperparameters αi . At each stage the required computations therefore scale like O(M 3 ),where M is the number of active basis vectors in the model and is typically muchsmaller than the number N of training patterns.7.2.3 RVM for classiﬁcationWe can extend the relevance vector machine framework to classiﬁcation problems by applying the ARD prior over weights to a probabilistic linear classiﬁcationmodel of the kind studied in Chapter 4. To start with, we consider two-class problems with a binary target variable t ∈ {0, 1}.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.