Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 57

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 57 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 572020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 57)

Error Backpropagation241This update is repeated by cycling through the data either in sequence or by selectingpoints at random with replacement. There are of course intermediate scenarios inwhich the updates are based on batches of data points.One advantage of on-line methods compared to batch methods is that the formerhandle redundancy in the data much more efﬁciently. To see, this consider an extreme example in which we take a data set and double its size by duplicating everydata point.

Note that this simply multiplies the error function by a factor of 2 and sois equivalent to using the original error function. Batch methods will require doublethe computational effort to evaluate the batch error function gradient, whereas online methods will be unaffected. Another property of on-line gradient descent is thepossibility of escaping from local minima, since a stationary point with respect tothe error function for the whole data set will generally not be a stationary point foreach data point individually.Nonlinear optimization algorithms, and their practical application to neural network training, are discussed in detail in Bishop and Nabney (2008).5.3. Error BackpropagationOur goal in this section is to ﬁnd an efﬁcient technique for evaluating the gradientof an error function E(w) for a feed-forward neural network.

We shall see thatthis can be achieved using a local message passing scheme in which information issent alternately forwards and backwards through the network and is known as errorbackpropagation, or sometimes simply as backprop.It should be noted that the term backpropagation is used in the neural computing literature to mean a variety of different things. For instance, the multilayerperceptron architecture is sometimes called a backpropagation network.

The termbackpropagation is also used to describe the training of a multilayer perceptron using gradient descent applied to a sum-of-squares error function. In order to clarifythe terminology, it is useful to consider the nature of the training process more carefully. Most training algorithms involve an iterative procedure for minimization of anerror function, with adjustments to the weights being made in a sequence of steps. Ateach such step, we can distinguish between two distinct stages. In the ﬁrst stage, thederivatives of the error function with respect to the weights must be evaluated. Aswe shall see, the important contribution of the backpropagation technique is in providing a computationally efﬁcient method for evaluating such derivatives.

Becauseit is at this stage that errors are propagated backwards through the network, we shalluse the term backpropagation speciﬁcally to describe the evaluation of derivatives.In the second stage, the derivatives are then used to compute the adjustments to bemade to the weights. The simplest such technique, and the one originally consideredby Rumelhart et al. (1986), involves gradient descent.

It is important to recognizethat the two stages are distinct. Thus, the ﬁrst stage, namely the propagation of errors backwards through the network in order to evaluate derivatives, can be appliedto many other kinds of network and not just the multilayer perceptron. It can also beapplied to error functions other that just the simple sum-of-squares, and to the eval-2425. NEURAL NETWORKSuation of other derivatives such as the Jacobian and Hessian matrices, as we shallsee later in this chapter.

Similarly, the second stage of weight adjustment using thecalculated derivatives can be tackled using a variety of optimization schemes, manyof which are substantially more powerful than simple gradient descent.5.3.1 Evaluation of error-function derivativesWe now derive the backpropagation algorithm for a general network having arbitrary feed-forward topology, arbitrary differentiable nonlinear activation functions,and a broad class of error function. The resulting formulae will then be illustratedusing a simple layered network structure having a single layer of sigmoidal hiddenunits together with a sum-of-squares error.Many error functions of practical interest, for instance those deﬁned by maximum likelihood for a set of i.i.d. data, comprise a sum of terms, one for each datapoint in the training set, so thatE(w) =NEn (w).(5.44)n=1Here we shall consider the problem of evaluating ∇En (w) for one such term in theerror function.

This may be used directly for sequential optimization, or the resultscan be accumulated over the training set in the case of batch methods.Consider ﬁrst a simple linear model in which the outputs yk are linear combinations of the input variables xi so thatwki xi(5.45)yk =itogether with an error function that, for a particular input pattern n, takes the form1(ynk − tnk )2(5.46)En =2kwhere ynk = yk (xn , w). The gradient of this error function with respect to a weightwji is given by∂En= (ynj − tnj )xni(5.47)∂wjiwhich can be interpreted as a ‘local’ computation involving the product of an ‘errorsignal’ ynj − tnj associated with the output end of the link wji and the variable xniassociated with the input end of the link.

In Section 4.3.2, we saw how a similarformula arises with the logistic sigmoid activation function together with the crossentropy error function, and similarly for the softmax activation function togetherwith its matching cross-entropy error function. We shall now see how this simpleresult extends to the more complex setting of multilayer feed-forward networks.In a general feed-forward network, each unit computes a weighted sum of itsinputs of the formaj =wji zi(5.48)i5.3.

Error Backpropagation243where zi is the activation of a unit, or input, that sends a connection to unit j, and wjiis the weight associated with that connection. In Section 5.1, we saw that biases canbe included in this sum by introducing an extra unit, or input, with activation ﬁxedat +1. We therefore do not need to deal with biases explicitly. The sum in (5.48) istransformed by a nonlinear activation function h(·) to give the activation zj of unit jin the form(5.49)zj = h(aj ).Note that one or more of the variables zi in the sum in (5.48) could be an input, andsimilarly, the unit j in (5.49) could be an output.For each pattern in the training set, we shall suppose that we have supplied thecorresponding input vector to the network and calculated the activations of all ofthe hidden and output units in the network by successive application of (5.48) and(5.49).

This process is often called forward propagation because it can be regardedas a forward ﬂow of information through the network.Now consider the evaluation of the derivative of En with respect to a weightwji . The outputs of the various units will depend on the particular input pattern n.However, in order to keep the notation uncluttered, we shall omit the subscript nfrom the network variables. First we note that En depends on the weight wji onlyvia the summed input aj to unit j. We can therefore apply the chain rule for partialderivatives to give∂En∂En ∂aj=.(5.50)∂wji∂aj ∂wjiWe now introduce a useful notationδj ≡∂En∂aj(5.51)where the δ’s are often referred to as errors for reasons we shall see shortly.

Using(5.48), we can write∂aj= zi .(5.52)∂wjiSubstituting (5.51) and (5.52) into (5.50), we then obtain∂En= δ j zi .∂wji(5.53)Equation (5.53) tells us that the required derivative is obtained simply by multiplyingthe value of δ for the unit at the output end of the weight by the value of z for the unitat the input end of the weight (where z = 1 in the case of a bias). Note that this takesthe same form as for the simple linear model considered at the start of this section.Thus, in order to evaluate the derivatives, we need only to calculate the value of δjfor each hidden and output unit in the network, and then apply (5.53).As we have seen already, for the output units, we haveδ k = yk − tk(5.54)2445.

NEURAL NETWORKSFigure 5.7Illustration of the calculation of δj for hidden unit j bybackpropagation of the δ’s from those units k to which ziunit j sends connections. The blue arrow denotes thedirection of information ﬂow during forward propagation,and the red arrows indicate the backward propagationof error information.wjiδjzjδkwkjδ1provided we are using the canonical link as the output-unit activation function. Toevaluate the δ’s for hidden units, we again make use of the chain rule for partialderivatives,∂En ∂En ∂akδj ≡=(5.55)∂aj∂ak ∂ajkwhere the sum runs over all units k to which unit j sends connections.

The arrangement of units and weights is illustrated in Figure 5.7. Note that the units labelled kcould include other hidden units and/or output units. In writing down (5.55), we aremaking use of the fact that variations in aj give rise to variations in the error function only through variations in the variables ak . If we now substitute the deﬁnitionof δ given by (5.51) into (5.55), and make use of (5.48) and (5.49), we obtain thefollowing backpropagation formulawkj δk(5.56)δj = h (aj )kwhich tells us that the value of δ for a particular hidden unit can be obtained bypropagating the δ’s backwards from units higher up in the network, as illustratedin Figure 5.7.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.