Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 63

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 63 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 632020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 63)

This is the basis for the convolutional neural network (Le Cun et al., 1989;LeCun et al., 1998), which has been widely applied to image data.Consider the speciﬁc task of recognizing handwritten digits. Each input imagecomprises a set of pixel intensity values, and the desired output is a posterior probability distribution over the ten digit classes. We know that the identity of the digit isinvariant under translations and scaling as well as (small) rotations. Furthermore, thenetwork must also exhibit invariance to more subtle transformations such as elasticdeformations of the kind illustrated in Figure 5.14. One simple approach would be totreat the image as the input to a fully connected network, such as the kind shown inFigure 5.1.

Given a sufﬁciently large training set, such a network could in principleyield a good solution to this problem and would learn the appropriate invariances byexample.However, this approach ignores a key property of images, which is that nearbypixels are more strongly correlated than more distant pixels. Many of the modernapproaches to computer vision exploit this property by extracting local features thatdepend only on small subregions of the image. Information from such features canthen be merged in later stages of processing in order to detect higher-order features2685.

NEURAL NETWORKSInput imageFigure 5.17Convolutional layerSub-samplinglayerDiagram illustrating part of a convolutional neural network, showing a layer of convolutional units followed by a layer of subsampling units. Several successive pairs of suchlayers may be used.and ultimately to yield information about the image as whole. Also, local featuresthat are useful in one region of the image are likely to be useful in other regions ofthe image, for instance if the object of interest is translated.These notions are incorporated into convolutional neural networks through threemechanisms: (i) local receptive ﬁelds, (ii) weight sharing, and (iii) subsampling. Thestructure of a convolutional network is illustrated in Figure 5.17. In the convolutionallayer the units are organized into planes, each of which is called a feature map.

Unitsin a feature map each take inputs only from a small subregion of the image, and allof the units in a feature map are constrained to share the same weight values. Forinstance, a feature map might consist of 100 units arranged in a 10 × 10 grid, witheach unit taking inputs from a 5 × 5 pixel patch of the image. The whole feature maptherefore has 25 adjustable weight parameters plus one adjustable bias parameter.Input values from a patch are linearly combined using the weights and the bias, andthe result transformed by a sigmoidal nonlinearity using (5.1). If we think of the unitsas feature detectors, then all of the units in a feature map detect the same pattern butat different locations in the input image.

Due to the weight sharing, the evaluationof the activations of these units is equivalent to a convolution of the image pixelintensities with a ‘kernel’ comprising the weight parameters. If the input image isshifted, the activations of the feature map will be shifted by the same amount but willotherwise be unchanged. This provides the basis for the (approximate) invariance of5.5. Regularization in Neural NetworksExercise 5.28269the network outputs to translations and distortions of the input image. Because wewill typically need to detect multiple features in order to build an effective model,there will generally be multiple feature maps in the convolutional layer, each havingits own set of weight and bias parameters.The outputs of the convolutional units form the inputs to the subsampling layerof the network.

For each feature map in the convolutional layer, there is a plane ofunits in the subsampling layer and each unit takes inputs from a small receptive ﬁeldin the corresponding feature map of the convolutional layer. These units performsubsampling. For instance, each subsampling unit might take inputs from a 2 × 2unit region in the corresponding feature map and would compute the average ofthose inputs, multiplied by an adaptive weight with the addition of an adaptive biasparameter, and then transformed using a sigmoidal nonlinear activation function.The receptive ﬁelds are chosen to be contiguous and nonoverlapping so that thereare half the number of rows and columns in the subsampling layer compared withthe convolutional layer.

In this way, the response of a unit in the subsampling layerwill be relatively insensitive to small shifts of the image in the corresponding regionsof the input space.In a practical architecture, there may be several pairs of convolutional and subsampling layers. At each stage there is a larger degree of invariance to input transformations compared to the previous layer. There may be several feature maps in agiven convolutional layer for each plane of units in the previous subsampling layer,so that the gradual reduction in spatial resolution is then compensated by an increasing number of features. The ﬁnal layer of the network would typically be a fullyconnected, fully adaptive layer, with a softmax output nonlinearity in the case ofmulticlass classiﬁcation.The whole network can be trained by error minimization using backpropagationto evaluate the gradient of the error function.

This involves a slight modiﬁcationof the usual backpropagation algorithm to ensure that the shared-weight constraintsare satisﬁed. Due to the use of local receptive ﬁelds, the number of weights inthe network is smaller than if the network were fully connected. Furthermore, thenumber of independent parameters to be learned from the data is much smaller still,due to the substantial numbers of constraints on the weights.5.5.7 Soft weight sharingOne way to reduce the effective complexity of a network with a large numberof weights is to constrain weights within certain groups to be equal.

This is thetechnique of weight sharing that was discussed in Section 5.5.6 as a way of buildingtranslation invariance into networks used for image interpretation. It is only applicable, however, to particular problems in which the form of the constraints can bespeciﬁed in advance. Here we consider a form of soft weight sharing (Nowlan andHinton, 1992) in which the hard constraint of equal weights is replaced by a formof regularization in which groups of weights are encouraged to have similar values.Furthermore, the division of weights into groups, the mean weight value for eachgroup, and the spread of values within the groups are all determined as part of thelearning process.2705.

NEURAL NETWORKSSection 2.3.9Recall that the simple weight decay regularizer, given in (5.112), can be viewedas the negative log of a Gaussian prior distribution over the weights. We can encourage the weight values to form several groups, rather than just one group, by considering instead a probability distribution that is a mixture of Gaussians. The centresand variances of the Gaussian components, as well as the mixing coefﬁcients, will beconsidered as adjustable parameters to be determined as part of the learning process.Thus, we have a probability density of the formp(w) =p(wi )(5.136)iwherep(wi ) =Mπj N (wi |µj , σj2 )(5.137)j =1and πj are the mixing coefﬁcients. Taking the negative logarithm then leads to aregularization function of the formM2lnπj N (wi |µj , σj ) .(5.138)Ω(w) = −ij =1The total error function is then given byE(w)= E(w) + λΩ(w)(5.139)where λ is the regularization coefﬁcient.

This error is minimized both with respectto the weights wi and with respect to the parameters {πj , µj , σj } of the mixturemodel. If the weights were constant, then the parameters of the mixture model couldbe determined by using the EM algorithm discussed in Chapter 9. However, the distribution of weights is itself evolving during the learning process, and so to avoid numerical instability, a joint optimization is performed simultaneously over the weightsand the mixture-model parameters. This can be done using a standard optimizationalgorithm such as conjugate gradients or quasi-Newton methods.In order to minimize the total error function, it is necessary to be able to evaluateits derivatives with respect to the various adjustable parameters.

To do this it is convenient to regard the {πj } as prior probabilities and to introduce the correspondingposterior probabilities which, following (2.192), are given by Bayes’ theorem in theformπj N (w|µj , σj2 ).(5.140)γj (w) =2k πk N (w|µk , σk )Exercise 5.29The derivatives of the total error function with respect to the weights are then givenby(wi − µj )∂E∂E=+λγj (wi ).(5.141)∂wi∂wiσj2j5.5. Regularization in Neural NetworksExercise 5.30271The effect of the regularization term is therefore to pull each weight towards thecentre of the j th Gaussian, with a force proportional to the posterior probability ofthat Gaussian for the given weight.

This is precisely the kind of effect that we areseeking.Derivatives of the error with respect to the centres of the Gaussians are alsoeasily computed to give(µi − wj )∂E=λγj (wi )∂µjσj2(5.142)iExercise 5.31which has a simple intuitive interpretation, because it pushes µj towards an average of the weight values, weighted by the posterior probabilities that the respectiveweight parameters were generated by component j. Similarly, the derivatives withrespect to the variances are given by(wi − µj )21∂E=λγj (wi )−(5.143)∂σjσjσj3iwhich drives σj towards the weighted average of the squared deviations of the weightsaround the corresponding centre µj , where the weighting coefﬁcients are again givenby the posterior probability that each weight is generated by component j.

Note thatin a practical implementation, new variables ηj deﬁned byσj2 = exp(ηj )(5.144)are introduced, and the minimization is performed with respect to the ηj . This ensures that the parameters σj remain positive. It also has the effect of discouragingpathological solutions in which one or more of the σj goes to zero, correspondingto a Gaussian component collapsing onto one of the weight parameter values. Suchsolutions are discussed in more detail in the context of Gaussian mixture models inSection 9.2.1.For the derivatives with respect to the mixing coefﬁcients πj , we need to takeaccount of the constraintsπj = 1,0 πi 1(5.145)jwhich follow from the interpretation of the πj as prior probabilities. This can bedone by expressing the mixing coefﬁcients in terms of a set of auxiliary variables{ηj } using the softmax function given byexp(ηj )πj = M.k=1 exp(ηk )Exercise 5.32(5.146)The derivatives of the regularized error function with respect to the {ηj } then takethe form2725.

NEURAL NETWORKSFigure 5.18 The left ﬁgure shows a two-link robot arm,in which the Cartesian coordinates (x1 , x2 ) of the end effector are determined uniquely by the two joint angles θ1and θ2 and the (ﬁxed) lengths L1 and L2 of the arms. Thisis know as the forward kinematics of the arm. In practice, we have to ﬁnd the joint angles that will give rise to adesired end effector position and, as shown in the right ﬁgure, this inverse kinematics has two solutions corresponding to ‘elbow up’ and ‘elbow down’.(x1 , x2 )L2(x1 , x2 )θ2L1elbowupθ1∂E={πj − γj (wi )} .∂ηjelbowdown(5.147)iWe see that πj is therefore driven towards the average posterior probability for component j.5.6.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.