Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 79

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 79 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 792020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 79)

Maximum Margin ClassiﬁersFigure 7.7Illustration of SVM regression, showingy(x)the regression curve together with the insensitive ‘tube’. Also shown are examples of the slack variables ξ and bξ. Pointsabove the -tube have ξ > 0 and bξ = 0,points below the -tube have ξ = 0 andbξ > 0, and points inside the -tube haveξ =bξ = 0.341y+ξ>0yy−ξ > 0xThe error function for support vector regression can then be written asCNn=11(ξn + ξn ) + w22(7.55)which must be minimized subject to the constraints ξn 0 and ξn 0 as well as(7.53) and (7.54). This can be achieved by introducing Lagrange multipliers an 0,an 0, µn 0, and µn 0 and optimizing the LagrangianL = CNn=1−N1(ξn + ξn ) + w2 −(µn ξn + µ nξn )2Nn=1an ( + ξn + yn − tn ) −n=1Nan ( + ξn − yn + tn ).

(7.56)n=1We now substitute for y(x) using (7.1) and then set the derivatives of the Laξn to zero, givinggrangian with respect to w, b, ξn , and ∂L(an − an )φ(xn )=0 ⇒ w=∂w(7.57)N∂L(an − an ) = 0=0 ⇒∂b(7.58)Nn=1n=1∂L= 0 ⇒ an + µn = C∂ξn∂L=0 ⇒ an + µn = C.∂ξnExercise 7.7(7.59)(7.60)Using these results to eliminate the corresponding variables from the Lagrangian, wesee that the dual problem involves maximizing3427. SPARSE KERNEL MACHINES1 L(a,a) = −2−N N(an − an )(am − am )k(xn , xm )n=1 m=1NN(an + an ) +(an − an )tnn=1(7.61)n=1with respect to {an } and {an }, where we have introduced the kernel k(x, x ) =Tφ(x) φ(x ). Again, this is a constrained maximization, and to ﬁnd the constraintsan 0 are both required because these are Lagrangewe note that an 0 and multipliers.

Also µn 0 and µn 0 together with (7.59) and (7.60), requirean C and an C, and so again we have the box constraints0 an Can C0(7.62)(7.63)together with the condition (7.58).Substituting (7.57) into (7.1), we see that predictions for new inputs can be madeusingN(an − an )k(x, xn ) + b(7.64)y(x) =n=1which is again expressed in terms of the kernel function.The corresponding Karush-Kuhn-Tucker (KKT) conditions, which state that atthe solution the product of the dual variables and the constraints must vanish, aregiven byan ( + ξn + yn − tn ) = 0an ( + ξn − yn + tn ) = 0(C − an )ξn = 0(C − an )ξn = 0.(7.65)(7.66)(7.67)(7.68)From these we can obtain several useful results.

First of all, we note that a coefﬁcientan can only be nonzero if + ξn + yn − tn = 0, which implies that the data pointeither lies on the upper boundary of the -tube (ξn = 0) or lies above the upperboundary (ξn > 0). Similarly, a nonzero value for an implies + ξn − yn + tn = 0,and such points must lie either on or below the lower boundary of the -tube.ξn − yn + tn = 0Furthermore, the two constraints + ξn + yn − tn = 0 and + are incompatible, as is easily seen by adding them together and noting that ξn andξn are nonnegative while is strictly positive, and so for every data point xn , eitheran or an (or both) must be zero.The support vectors are those data points that contribute to predictions given byan = 0.

These are points that(7.64), in other words those for which either an = 0 or lie on the boundary of the -tube or outside the tube. All points within the tube have7.1. Maximum Margin Classiﬁers343an = 0. We again have a sparse solution, and the only terms that have to bean = evaluated in the predictive model (7.64) are those that involve the support vectors.The parameter b can be found by considering a data point for which 0 < an <C, which from (7.67) must have ξn = 0, and from (7.65) must therefore satisfy + yn − tn = 0.

Using (7.1) and solving for b, we obtainb = tn − − wT φ(xn )N= tn − −(am − am )k(xn , xm )(7.69)m=1where we have used (7.57). We can obtain an analogous result by considering a pointfor which 0 < an < C. In practice, it is better to average over all such estimates ofb.As with the classiﬁcation case, there is an alternative formulation of the SVMfor regression in which the parameter governing complexity has a more intuitiveinterpretation (Schölkopf et al., 2000). In particular, instead of ﬁxing the width ofthe insensitive region, we ﬁx instead a parameter ν that bounds the fraction of pointslying outside the tube.

This involves maximizing1 L(a,a) = −2+N N(an − an )(am − am )k(xn , xm )n=1 m=1N(an − an )tn(7.70)n=1subject to the constraints0 an C/Nan C/N0N(7.71)(7.72)(an − an ) = 0(7.73)(an + an ) νC.(7.74)n=1Nn=1Appendix AIt can be shown that there are at most νN data points falling outside the insensitivetube, while at least νN data points are support vectors and so lie either on the tubeor outside it.The use of a support vector machine to solve a regression problem is illustratedusing the sinusoidal data set in Figure 7.8. Here the parameters ν and C have beenchosen by hand. In practice, their values would typically be determined by crossvalidation.3447.

SPARSE KERNEL MACHINESFigure 7.8Illustration of the ν-SVM for regression applied to the sinusoidalsynthetic data set using Gaussiankernels. The predicted regressioncurve is shown by the red line, andthe -insensitive tube correspondsto the shaded region. Also, thedata points are shown in green,and those with support vectorsare indicated by blue circles.1t0−10x17.1.5 Computational learning theoryHistorically, support vector machines have largely been motivated and analysedusing a theoretical framework known as computational learning theory, also sometimes called statistical learning theory (Anthony and Biggs, 1992; Kearns and Vazirani, 1994; Vapnik, 1995; Vapnik, 1998). This has its origins with Valiant (1984)who formulated the probably approximately correct, or PAC, learning framework.The goal of the PAC framework is to understand how large a data set needs to be inorder to give good generalization.

It also gives bounds for the computational cost oflearning, although we do not consider these here.Suppose that a data set D of size N is drawn from some joint distribution p(x, t)where x is the input variable and t represents the class label, and that we restrictattention to ‘noise free’ situations in which the class labels are determined by some(unknown) deterministic function t = g(x).

In PAC learning we say that a functionf (x; D), drawn from a space F of such functions on the basis of the training setD, has good generalization if its expected error rate is below some pre-speciﬁedthreshold , so that(7.75)Ex,t [I (f (x; D) = t)] < where I(·) is the indicator function, and the expectation is with respect to the distribution p(x, t).

The quantity on the left-hand side is a random variable, becauseit depends on the training set D, and the PAC framework requires that (7.75) holds,with probability greater than 1 − δ, for a data set D drawn randomly from p(x, t).Here δ is another pre-speciﬁed parameter, and the terminology ‘probably approximately correct’ comes from the requirement that with high probability (greater than1 − δ), the error rate be small (less than ).

For a given choice of model space F, andfor given parameters and δ, PAC learning aims to provide bounds on the minimumsize N of data set needed to meet this criterion. A key quantity in PAC learning isthe Vapnik-Chervonenkis dimension, or VC dimension, which provides a measure ofthe complexity of a space of functions, and which allows the PAC framework to beextended to spaces containing an inﬁnite number of functions.The bounds derived within the PAC framework are often described as worst-7.2. Relevance Vector Machines345case, because they apply to any choice for the distribution p(x, t), so long as boththe training and the test examples are drawn (independently) from the same distribution, and for any choice for the function f (x) so long as it belongs to F.

In real-worldapplications of machine learning, we deal with distributions that have signiﬁcant regularity, for example in which large regions of input space carry the same class label.As a consequence of the lack of any assumptions about the form of the distribution,the PAC bounds are very conservative, in other words they strongly over-estimatethe size of data sets required to achieve a given generalization performance. For thisreason, PAC bounds have found few, if any, practical applications.One attempt to improve the tightness of the PAC bounds is the PAC-Bayesianframework (McAllester, 2003), which considers a distribution over the space F offunctions, somewhat analogous to the prior in a Bayesian treatment.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.