Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 70

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 70 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 702020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 70)

Radial Basis Function NetworksSection 12.1.3299This is the covariance matrix of the Fisher scores, and so the Fisher kernel corresponds to a whitening of these scores. More simply, we can just omit the Fisherinformation matrix altogether and use the noninvariant kernelk(x, x ) = g(θ, x)T g(θ, x ).(6.36)An application of Fisher kernels to document retrieval is given by Hofmann (2000).A ﬁnal example of a kernel function is the sigmoidal kernel given by(6.37)k(x, x ) = tanh axT x + bSection 6.4.7whose Gram matrix in general is not positive semideﬁnite.

This form of kernelhas, however, been used in practice (Vapnik, 1995), possibly because it gives kernelexpansions such as the support vector machine a superﬁcial resemblance to neuralnetwork models. As we shall see, in the limit of an inﬁnite number of basis functions,a Bayesian neural network with an appropriate prior reduces to a Gaussian process,thereby providing a deeper link between neural networks and kernel methods.6.3.

Radial Basis Function NetworksIn Chapter 3, we discussed regression models based on linear combinations of ﬁxedbasis functions, although we did not discuss in detail what form those basis functionsmight take. One choice that has been widely used is that of radial basis functions,which have the property that each basis function depends only on the radial distance(typically Euclidean) from a centre µj , so that φj (x) = h(x − µj ).Historically, radial basis functions were introduced for the purpose of exact function interpolation (Powell, 1987). Given a set of input vectors {x1 , . .

. , xN } alongwith corresponding target values {t1 , . . . , tN }, the goal is to ﬁnd a smooth functionf (x) that ﬁts every target value exactly, so that f (xn ) = tn for n = 1, . . . , N . Thisis achieved by expressing f (x) as a linear combination of radial basis functions, onecentred on every data pointf (x) =Nwn h(x − xn ).(6.38)n=1The values of the coefﬁcients {wn } are found by least squares, and because thereare the same number of coefﬁcients as there are constraints, the result is a functionthat ﬁts every target value exactly. In pattern recognition applications, however, thetarget values are generally noisy, and exact interpolation is undesirable because thiscorresponds to an over-ﬁtted solution.Expansions in radial basis functions also arise from regularization theory (Poggio and Girosi, 1990; Bishop, 1995a).

For a sum-of-squares error function with aregularizer deﬁned in terms of a differential operator, the optimal solution is givenby an expansion in the Green’s functions of the operator (which are analogous to theeigenvectors of a discrete matrix), again with one basis function centred on each data3006. KERNEL METHODSpoint. If the differential operator is isotropic then the Green’s functions depend onlyon the radial distance from the corresponding data point. Due to the presence of theregularizer, the solution no longer interpolates the training data exactly.Another motivation for radial basis functions comes from a consideration ofthe interpolation problem when the input (rather than the target) variables are noisy(Webb, 1994; Bishop, 1995a). If the noise on the input variable x is describedby a variable ξ having a distribution ν(ξ), then the sum-of-squares error functionbecomesN 12E={y(xn + ξ) − tn } ν(ξ) dξ.(6.39)2n=1Appendix DExercise 6.17Using the calculus of variations, we can optimize with respect to the function f (x)to giveNtn h(x − xn )(6.40)y(xn ) =n=1where the basis functions are given byh(x − xn ) =ν(x − xn )N.(6.41)ν(x − xn )n=1We see that there is one basis function centred on every data point.

This is known asthe Nadaraya-Watson model and will be derived again from a different perspectivein Section 6.3.1. If the noise distribution ν(ξ) is isotropic, so that it is a functiononly of ξ, then the basis functions will be radial.Note that the basis functions (6.41) are normalized, so that n h(x − xn ) = 1for any value of x. The effect of such normalization is shown in Figure 6.2. Normalization is sometimes used in practice as it avoids having regions of input space whereall of the basis functions take small values, which would necessarily lead to predictions in such regions that are either small or controlled purely by the bias parameter.Another situation in which expansions in normalized radial basis functions ariseis in the application of kernel density estimation to the problem of regression, as weshall discuss in Section 6.3.1.Because there is one basis function associated with every data point, the corresponding model can be computationally costly to evaluate when making predictionsfor new data points.

Models have therefore been proposed (Broomhead and Lowe,1988; Moody and Darken, 1989; Poggio and Girosi, 1990), which retain the expansion in radial basis functions but where the number M of basis functions is smallerthan the number N of data points. Typically, the number of basis functions, and thelocations µi of their centres, are determined based on the input data {xn } alone. Thebasis functions are then kept ﬁxed and the coefﬁcients {wi } are determined by leastsquares by solving the usual set of linear equations, as discussed in Section 3.1.1.6.3.

Radial Basis Function Networks110.80.80.60.60.40.40.20.20−1−0.500.510−1−0.500.53011Figure 6.2 Plot of a set of Gaussian basis functions on the left, together with the corresponding normalizedbasis functions on the right.Section 9.1One of the simplest ways of choosing basis function centres is to use a randomlychosen subset of the data points. A more systematic approach is called orthogonalleast squares (Chen et al., 1991).

This is a sequential selection process in which ateach step the next data point to be chosen as a basis function centre corresponds tothe one that gives the greatest reduction in the sum-of-squares error. Values for theexpansion coefﬁcients are determined as part of the algorithm.

Clustering algorithmssuch as K-means have also been used, which give a set of basis function centres thatno longer coincide with training data points.6.3.1 Nadaraya-Watson modelSection 2.5.1In Section 3.3.3, we saw that the prediction of a linear regression model for anew input x takes the form of a linear combination of the training set target valueswith coefﬁcients given by the ‘equivalent kernel’ (3.62) where the equivalent kernelsatisﬁes the summation constraint (3.64).We can motivate the kernel regression model (3.61) from a different perspective,starting with kernel density estimation. Suppose we have a training set {xn , tn } andwe use a Parzen density estimator to model the joint distribution p(x, t), so thatp(x, t) =N1 f (x − xn , t − tn )N(6.42)n=1where f (x, t) is the component density function, and there is one such componentcentred on each data point.

We now ﬁnd an expression for the regression functiony(x), corresponding to the conditional average of the target variable conditioned on3026. KERNEL METHODSthe input variable, which is given by∞tp(t|x) dty(x) = E[t|x] =−∞tp(x, t) dt=p(x, t) dttf (x − xn , t − tn ) dt=n.(6.43)f (x − xm , t − tm ) dtmWe now assume for simplicity that the component density functions have zero meanso that ∞f (x, t)t dt = 0(6.44)−∞for all values of x. Using a simple change of variable, we then obtaing(x − xn )tnny(x) =g(x − xm )m=k(x, xn )tn(6.45)nwhere n, m = 1, .

. . , N and the kernel function k(x, xn ) is given byg(x − xn )k(x, xn ) = g(x − xm )(6.46)mand we have deﬁned∞f (x, t) dt.g(x) =(6.47)−∞The result (6.45) is known as the Nadaraya-Watson model, or kernel regression(Nadaraya, 1964; Watson, 1964). For a localized kernel function, it has the property of giving more weight to the data points xn that are close to x. Note that thekernel (6.46) satisﬁes the summation constraintNn=1k(x, xn ) = 1.6.4. Gaussian Processes303Figure 6.3 Illustration of the Nadaraya-Watson kernel1.5regression model using isotropic Gaussian kernels, for thesinusoidal data set. The original sine function is shown1by the green curve, the data points are shown in blue,and each is the centre of an isotropic Gaussian kernel.The resulting regression function, given by the condi- 0.5tional mean, is shown by the red line, along with the two0standard-deviation region for the conditional distributionp(t|x) shown by the red shading.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.