Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 62

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 62 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 622020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 62)

In Section 5.5.5, we shall show that this approach isclosely related to approach 2.2635.5. Regularization in Neural NetworksFigure 5.14 Illustration of the synthetic warping of a handwritten digit. The original image is shown on theleft. On the right, the top row shows three examples of warped digits, with the corresponding displacementﬁelds shown on the bottom row. These displacement ﬁelds are generated by sampling random displacements∆x, ∆y ∈ (0, 1) at each pixel and then smoothing by convolution with Gaussians of width 0.01, 30 and 60respectively.One advantage of approach 3 is that it can correctly extrapolate well beyond therange of transformations included in the training set.

However, it can be difﬁcultto ﬁnd hand-crafted features with the required invariances that do not also discardinformation that can be useful for discrimination.5.5.4 Tangent propagationWe can use regularization to encourage models to be invariant to transformationsof the input through the technique of tangent propagation (Simard et al., 1992).Consider the effect of a transformation on a particular input vector xn .

Provided thetransformation is continuous (such as translation or rotation, but not mirror reﬂectionfor instance), then the transformed pattern will sweep out a manifold M within theD-dimensional input space. This is illustrated in Figure 5.15, for the case of D =2 for simplicity. Suppose the transformation is governed by a single parameter ξ(which might be rotation angle for instance). Then the subspace M swept out by xnFigure 5.15Illustration of a two-dimensional input space x2showing the effect of a continuous transformation on a particular input vector xn . A onedimensional transformation, parameterized bythe continuous variable ξ, applied to xn causesit to sweep out a one-dimensional manifold M.Locally, the effect of the transformation can beapproximated by the tangent vector τ n .Mτnxnξx12645. NEURAL NETWORKSwill be one-dimensional, and will be parameterized by ξ.

Let the vector that resultsfrom acting on xn by this transformation be denoted by s(xn , ξ), which is deﬁnedso that s(x, 0) = x. Then the tangent to the curve M is given by the directionalderivative τ = ∂s/∂ξ, and the tangent vector at the point xn is given by∂s(xn , ξ) τn =.(5.125)∂ξξ =0Under a transformation of the input vector, the network output vector will, in general,change.

The derivative of output k with respect to ξ is given byDD∂yk ∂xi ∂yk ==Jki τi(5.126)∂ξ ξ=0∂xi ∂ξ i=1ξ =0i=1where Jki is the (k, i) element of the Jacobian matrix J, as discussed in Section 5.3.4.The result (5.126) can be used to modify the standard error function, so as to encourage local invariance in the neighbourhood of the data points, by the addition to theoriginal error function E of a regularization function Ω to give a total error functionof the form = E + λΩE(5.127)where λ is a regularization coefﬁcient and1 Ω=2nExercise 5.26kD2 2∂ynk 1 =Jnki τni .∂ξ ξ=02nk(5.128)i=1The regularization function will be zero when the network mapping function is invariant under the transformation in the neighbourhood of each pattern vector, andthe value of the parameter λ determines the balance between ﬁtting the training dataand learning the invariance property.In a practical implementation, the tangent vector τ n can be approximated using ﬁnite differences, by subtracting the original vector xn from the correspondingvector after transformation using a small value of ξ, and then dividing by ξ.

This isillustrated in Figure 5.16.The regularization function depends on the network weights through the Jacobian J. A backpropagation formalism for computing the derivatives of the regularizer with respect to the network weights is easily obtained by extension of thetechniques introduced in Section 5.3.If the transformation is governed by L parameters (e.g., L = 3 for the case oftranslations combined with in-plane rotations in a two-dimensional image), then themanifold M will have dimensionality L, and the corresponding regularizer is givenby the sum of terms of the form (5.128), one for each transformation.

If severaltransformations are considered at the same time, and the network mapping is madeinvariant to each separately, then it will be (locally) invariant to combinations of thetransformations (Simard et al., 1992).5.5. Regularization in Neural NetworksFigure 5.16 Illustration showing(a) the original image x of a handwritten digit, (b) the tangent vectorτ corresponding to an inﬁnitesimalclockwise rotation, (c) the result ofadding a small contribution from thetangent vector to the original imagegiving x + τ with = 15 degrees,and (d) the true image rotated forcomparison.(a)(b)(c)(d)265A related technique, called tangent distance, can be used to build invarianceproperties into distance-based methods such as nearest-neighbour classiﬁers (Simardet al., 1993).5.5.5 Training with transformed dataWe have seen that one way to encourage invariance of a model to a set of transformations is to expand the training set using transformed versions of the originalinput patterns.

Here we show that this approach is closely related to the technique oftangent propagation (Bishop, 1995b; Leen, 1995).As in Section 5.5.4, we shall consider a transformation governed by a singleparameter ξ and described by the function s(x, ξ), with s(x, 0) = x. We shallalso consider a sum-of-squares error function. The error function for untransformedinputs can be written (in the inﬁnite data set limit) in the form1E={y(x) − t}2 p(t|x)p(x) dx dt(5.129)2as discussed in Section 1.5.5. Here we have considered a network having a singleoutput, in order to keep the notation uncluttered.

If we now consider an inﬁnitenumber of copies of each data point, each of which is perturbed by the transformation2665. NEURAL NETWORKSin which the parameter ξ is drawn from a distribution p(ξ), then the error functiondeﬁned over this expanded data set can be written as1E={y(s(x, ξ)) − t}2 p(t|x)p(x)p(ξ) dx dt dξ.(5.130)2We now assume that the distribution p(ξ) has zero mean with small variance, so thatwe are only considering small transformations of the original input vectors. We canthen expand the transformation function as a Taylor series in powers of ξ to giveξ2 ∂2∂s(x, ξ)+s(x, ξ)+ O(ξ 3 )s(x, ξ) = s(x, 0) + ξ2∂ξ2∂ξξ =0ξ =01= x + ξτ + ξ 2 τ + O(ξ 3 )2where τ denotes the second derivative of s(x, ξ) with respect to ξ evaluated at ξ = 0.This allows us to expand the model function to givey(s(x, ξ)) = y(x) + ξτ T ∇y(x) +.ξ2 - T(τ ) ∇y(x) + τ T ∇∇y(x)τ + O(ξ 3 ).2Substituting into the mean error function (5.130) and expanding, we then have = 1E{y(x) − t}2 p(t|x)p(x) dx dt2{y(x) − t}τ T ∇y(x)p(t|x)p(x) dx dt+ E[ξ] 1 T2{y(x) − t}(τ ) ∇y(x) + τ T ∇∇y(x)τ+ E[ξ ]2 T2 .+ τ ∇y(x)p(t|x)p(x) dx dt + O(ξ 3 ).Because the distribution of transformations has zero mean we have E[ξ] = 0.

Also,we shall denote E[ξ 2 ] by λ. Omitting terms of O(ξ 3 ), the average error function thenbecomes = E + λΩE(5.131)where E is the original sum-of-squares error, and the regularization term Ω takes theform 1 TΩ ={y(x) − E[t|x]}(τ ) ∇y(x) + τ T ∇∇y(x)τ22+ τ T ∇y(x)p(x) dx(5.132)in which we have performed the integration over t.5.5. Regularization in Neural Networks267We can further simplify this regularization term as follows. In Section 1.5.5 wesaw that the function that minimizes the sum-of-squares error is given by the conditional average E[t|x] of the target values t.

From (5.131) we see that the regularizederror will equal the unregularized sum-of-squares plus terms which are O(ξ), and sothe network function that minimizes the total error will have the formy(x) = E[t|x] + O(ξ).(5.133)Thus, to leading order in ξ, the ﬁrst term in the regularizer vanishes and we are leftwith21 TΩ=τ ∇y(x) p(x) dx(5.134)2Exercise 5.27which is equivalent to the tangent propagation regularizer (5.128).If we consider the special case in which the transformation of the inputs simplyconsists of the addition of random noise, so that x → x + ξ, then the regularizertakes the form12∇y(x) p(x) dx(5.135)Ω=2which is known as Tikhonov regularization (Tikhonov and Arsenin, 1977; Bishop,1995b).

Derivatives of this regularizer with respect to the network weights can befound using an extended backpropagation algorithm (Bishop, 1993). We see that, forsmall noise amplitudes, Tikhonov regularization is related to the addition of randomnoise to the inputs, which has been shown to improve generalization in appropriatecircumstances (Sietsma and Dow, 1991).5.5.6 Convolutional networksAnother approach to creating models that are invariant to certain transformationof the inputs is to build the invariance properties into the structure of a neural network.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.