Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 69

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 69 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 692020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 69)

Constructing Kernels1110.50.750.7500.50.5−0.50.250.25−1−1010−1010−10.040.040.040.020.020.02000−101295−101−10101Figure 6.1 Illustration of the construction of kernel functions starting from a corresponding set of basis functions. In each column the lower plot shows the kernel function k(x, x ) deﬁned by (6.10) plotted as a function ofx for x = 0, while the upper plot shows the corresponding basis functions given by polynomials (left column),‘Gaussians’ (centre column), and logistic sigmoids (right column).If we take the particular case of a two-dimensional input space x = (x1 , x2 ) wecan expand out the terms and thereby identify the corresponding nonlinear featuremapping2k(x, z) = xT z = (x1 z1 + x2 z2 )2Appendix C= x21 z12 + 2x1 z1 x2 z2 + x22 z22√√= (x21 , 2x1 x2 , x22 )(z12 , 2z1 z2 , z22 )T= φ(x)T φ(z).(6.12)√We see that the feature mapping takes the form φ(x) = (x21 , 2x1 x2 , x22 )T andtherefore comprises all possible second order terms, with a speciﬁc weighting between them.More generally, however, we need a simple way to test whether a function constitutes a valid kernel without having to construct the function φ(x) explicitly.

Anecessary and sufﬁcient condition for a function k(x, x ) to be a valid kernel (ShaweTaylor and Cristianini, 2004) is that the Gram matrix K, whose elements are given byk(xn , xm ), should be positive semideﬁnite for all possible choices of the set {xn }.Note that a positive semideﬁnite matrix is not the same thing as a matrix whoseelements are nonnegative.One powerful technique for constructing new kernels is to build them out ofsimpler kernels as building blocks.

This can be done using the following properties:2966. KERNEL METHODSTechniques for Constructing New Kernels.Given valid kernels k1 (x, x ) and k2 (x, x ), the following new kernels will alsobe valid:k(x, x )k(x, x )k(x, x )k(x, x )k(x, x )k(x, x )k(x, x )k(x, x )k(x, x )k(x, x )==========ck1 (x, x )f (x)k1 (x, x )f (x )q (k1 (x, x ))exp (k1 (x, x ))k1 (x, x ) + k2 (x, x )k1 (x, x )k2 (x, x )k3 (φ(x), φ(x ))xT Axka (xa , xa ) + kb (xb , xb )ka (xa , xa )kb (xb , xb )(6.13)(6.14)(6.15)(6.16)(6.17)(6.18)(6.19)(6.20)(6.21)(6.22)where c > 0 is a constant, f (·) is any function, q(·) is a polynomial with nonnegative coefﬁcients, φ(x) is a function from x to RM , k3 (·, ·) is a valid kernel inRM , A is a symmetric positive semideﬁnite matrix, xa and xb are variables (notnecessarily disjoint) with x = (xa , xb ), and ka and kb are valid kernel functionsover their respective spaces.Equipped with these properties, we can now embark on the construction of morecomplex kernels appropriate to speciﬁc applications.

We require that the kernelk(x, x ) be symmetric and positive semideﬁnite and that it expresses the appropriateform of similarity between x and x according to the intended application. Here weconsider a few common examples of kernel functions. For a more extensive discussion of ‘kernel engineering’, see Shawe-Taylor and Cristianini (2004).2We saw that the simple polynomial kernel k(x, x ) = xT x contains onlyterms of degree two. If we consider the slightly generalized kernel k(x, x ) = T 2x x + c with c > 0, then the corresponding feature mapping φ(x) contains conMstant and linear terms as well as terms of order two.

Similarly, k(x, x ) = xT xcontains all monomials of order M . For instance, if x and x are two images, thenthe kernel represents a particular weighted sum of all possible products of M pixelsin the ﬁrst image with M pixels in the second image. This can similarly be generMalized to include all terms up to degree M by considering k(x, x ) = xT x + cwith c > 0.

Using the results (6.17) and (6.18) for combining kernels we see thatthese will all be valid kernel functions.Another commonly used kernel takes the form(6.23)k(x, x ) = exp −x − x 2 /2σ 2and is often called a ‘Gaussian’ kernel. Note, however, that in this context it isnot interpreted as a probability density, and hence the normalization coefﬁcient is6.2. Constructing Kernels297omitted. We can see that this is a valid kernel by expanding the squarex − x 2 = xT x + (x )T x − 2xT x(6.24)to givek(x, x ) = exp −xT x/2σ 2 exp xT x /σ 2 exp −(x )T x /2σ 2Exercise 6.11(6.25)and then making use of (6.14) and (6.16), together with the validity of the linearkernel k(x, x ) = xT x . Note that the feature vector that corresponds to the Gaussiankernel has inﬁnite dimensionality.The Gaussian kernel is not restricted to the use of Euclidean distance.

If we usekernel substitution in (6.24) to replace xT x with a nonlinear kernel κ(x, x ), weobtain1(6.26)k(x, x ) = exp − 2 (κ(x, x) + κ(x , x ) − 2κ(x, x )) .2σAn important contribution to arise from the kernel viewpoint has been the extension to inputs that are symbolic, rather than simply vectors of real numbers. Kernelfunctions can be deﬁned over objects as diverse as graphs, sets, strings, and text documents. Consider, for instance, a ﬁxed set and deﬁne a nonvectorial space consistingof all possible subsets of this set.

If A1 and A2 are two such subsets then one simplechoice of kernel would bek(A1 , A2 ) = 2|A1 ∩A2 |Exercise 6.12(6.27)where A1 ∩ A2 denotes the intersection of sets A1 and A2 , and |A| denotes thenumber of subsets in A. This is a valid kernel function because it can be shown tocorrespond to an inner product in a feature space.One powerful approach to the construction of kernels starts from a probabilisticgenerative model (Haussler, 1999), which allows us to apply generative models in adiscriminative setting. Generative models can deal naturally with missing data andin the case of hidden Markov models can handle sequences of varying length. Bycontrast, discriminative models generally give better performance on discriminativetasks than generative models.

It is therefore of some interest to combine these twoapproaches (Lasserre et al., 2006). One way to combine them is to use a generativemodel to deﬁne a kernel, and then use this kernel in a discriminative approach.Given a generative model p(x) we can deﬁne a kernel byk(x, x ) = p(x)p(x ).(6.28)This is clearly a valid kernel function because we can interpret it as an inner productin the one-dimensional feature space deﬁned by the mapping p(x). It says that twoinputs x and x are similar if they both have high probabilities.

We can use (6.13) and(6.17) to extend this class of kernels by considering sums over products of differentprobability distributions, with positive weighting coefﬁcients p(i), of the formp(x|i)p(x |i)p(i).(6.29)k(x, x ) =i2986. KERNEL METHODSSection 9.2Section 13.2This is equivalent, up to an overall multiplicative constant, to a mixture distributionin which the components factorize, with the index i playing the role of a ‘latent’variable. Two inputs x and x will give a large value for the kernel function, andhence appear similar, if they have signiﬁcant probability under a range of differentcomponents.

Taking the limit of an inﬁnite sum, we can also consider kernels of theformk(x, x ) = p(x|z)p(x |z)p(z) dz(6.30)where z is a continuous latent variable.Now suppose that our data consists of ordered sequences of length L so thatan observation is given by X = {x1 , . . . , xL }. A popular generative model forsequences is the hidden Markov model, which expresses the distribution p(X) as amarginalization over a corresponding sequence of hidden states Z = {z1 , . . .

, zL }.We can use this approach to deﬁne a kernel function measuring the similarity of twosequences X and X by extending the mixture representation (6.29) to givek(X, X ) =p(X|Z)p(X |Z)p(Z)(6.31)Zso that both observed sequences are generated by the same hidden sequence Z.

Thismodel can easily be extended to allow sequences of differing length to be compared.An alternative technique for using generative models to deﬁne kernel functionsis known as the Fisher kernel (Jaakkola and Haussler, 1999). Consider a parametricgenerative model p(x|θ) where θ denotes the vector of parameters. The goal is toﬁnd a kernel that measures the similarity of two input vectors x and x induced by thegenerative model. Jaakkola and Haussler (1999) consider the gradient with respectto θ, which deﬁnes a vector in a ‘feature’ space having the same dimensionality asθ.

In particular, they consider the Fisher scoreg(θ, x) = ∇θ ln p(x|θ)(6.32)from which the Fisher kernel is deﬁned byk(x, x ) = g(θ, x)T F−1 g(θ, x ).Here F is the Fisher information matrix, given byF = Ex g(θ, x)g(θ, x)TExercise 6.13(6.33)(6.34)where the expectation is with respect to x under the distribution p(x|θ). This canbe motivated from the perspective of information geometry (Amari, 1998), whichconsiders the differential geometry of the space of model parameters. Here we simply note that the presence of the Fisher information matrix causes this kernel to beinvariant under a nonlinear re-parameterization of the density model θ → ψ(θ).In practice, it is often infeasible to evaluate the Fisher information matrix. Oneapproach is simply to replace the expectation in the deﬁnition of the Fisher information with the sample average, givingFN1 g(θ, xn )g(θ, xn )T .Nn=1(6.35)6.3.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.