Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 58

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 58 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 582020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 58)

Note that the summation in (5.56) is taken over the ﬁrst index onwkj (corresponding to backward propagation of information through the network),whereas in the forward propagation equation (5.10) it is taken over the second index.Because we already know the values of the δ’s for the output units, it follows thatby recursively applying (5.56) we can evaluate the δ’s for all of the hidden units in afeed-forward network, regardless of its topology.The backpropagation procedure can therefore be summarized as follows.Error Backpropagation1. Apply an input vector xn to the network and forward propagate throughthe network using (5.48) and (5.49) to ﬁnd the activations of all the hiddenand output units.2.

Evaluate the δk for all the output units using (5.54).3. Backpropagate the δ’s using (5.56) to obtain δj for each hidden unit in thenetwork.4. Use (5.53) to evaluate the required derivatives.5.3. Error Backpropagation245For batch methods, the derivative of the total error E can then be obtained byrepeating the above steps for each pattern in the training set and then summing overall patterns: ∂En∂E=.(5.57)∂wji∂wjinIn the above derivation we have implicitly assumed that each hidden or output unit inthe network has the same activation function h(·).

The derivation is easily generalized, however, to allow different units to have individual activation functions, simplyby keeping track of which form of h(·) goes with which unit.5.3.2 A simple exampleThe above derivation of the backpropagation procedure allowed for generalforms for the error function, the activation functions, and the network topology.

Inorder to illustrate the application of this algorithm, we shall consider a particularexample. This is chosen both for its simplicity and for its practical importance, because many applications of neural networks reported in the literature make use ofthis type of network. Speciﬁcally, we shall consider a two-layer network of the formillustrated in Figure 5.1, together with a sum-of-squares error, in which the outputunits have linear activation functions, so that yk = ak , while the hidden units havelogistic sigmoid activation functions given byh(a) ≡ tanh(a)(5.58)whereea − e−a.(5.59)ea + e−aA useful feature of this function is that its derivative can be expressed in a particularly simple form:h (a) = 1 − h(a)2 .(5.60)We also consider a standard sum-of-squares error function, so that for pattern n theerror is given byK1(yk − tk )2(5.61)En =2tanh(a) =k=1where yk is the activation of output unit k, and tk is the corresponding target, for aparticular input pattern xn .For each pattern in the training set in turn, we ﬁrst perform a forward propagationusingaj=D(1)wji xi(5.62)i=0zj= tanh(aj )yk=Mj =0(2)wkj zj .(5.63)(5.64)2465.

NEURAL NETWORKSNext we compute the δ’s for each output unit usingδ k = yk − tk .(5.65)Then we backpropagate these to obtain δs for the hidden units usingδj = (1 − zj2 )Kwkj δk .(5.66)k=1Finally, the derivatives with respect to the ﬁrst-layer and second-layer weights aregiven by∂En∂En= δj xi ,= δ k zj .(5.67)(1)(2)∂wji∂wkj5.3.3 Efﬁciency of backpropagationOne of the most important aspects of backpropagation is its computational efﬁciency.

To understand this, let us examine how the number of computer operationsrequired to evaluate the derivatives of the error function scales with the total numberW of weights and biases in the network. A single evaluation of the error function(for a given input pattern) would require O(W ) operations, for sufﬁciently large W .This follows from the fact that, except for a network with very sparse connections,the number of weights is typically much greater than the number of units, and so thebulk of the computational effort in forward propagation is concerned with evaluating the sums in (5.48), with the evaluation of the activation functions representing asmall overhead.

Each term in the sum in (5.48) requires one multiplication and oneaddition, leading to an overall computational cost that is O(W ).An alternative approach to backpropagation for computing the derivatives of theerror function is to use ﬁnite differences. This can be done by perturbing each weightin turn, and approximating the derivatives by the expressionEn (wji + ) − En (wji )∂En=+ O()∂wji(5.68)where 1. In a software simulation, the accuracy of the approximation to thederivatives can be improved by making smaller, until numerical roundoff problemsarise.

The accuracy of the ﬁnite differences method can be improved signiﬁcantlyby using symmetrical central differences of the formEn (wji + ) − En (wji − )∂En=+ O(2 ).∂wji2Exercise 5.14(5.69)In this case, the O() corrections cancel, as can be veriﬁed by Taylor expansion onthe right-hand side of (5.69), and so the residual corrections are O(2 ). The numberof computational steps is, however, roughly doubled compared with (5.68).The main problem with numerical differentiation is that the highly desirableO(W ) scaling has been lost. Each forward propagation requires O(W ) steps, and5.3.

Error BackpropagationFigure 5.8Illustration of a modular patternrecognition system in which theJacobian matrix can be used uto backpropagate error signalsfrom the outputs through to earlier modules in the system.247vyxwzthere are W weights in the network each of which must be perturbed individually, sothat the overall scaling is O(W 2 ).However, numerical differentiation plays an important role in practice, because acomparison of the derivatives calculated by backpropagation with those obtained using central differences provides a powerful check on the correctness of any softwareimplementation of the backpropagation algorithm. When training networks in practice, derivatives should be evaluated using backpropagation, because this gives thegreatest accuracy and numerical efﬁciency.

However, the results should be comparedwith numerical differentiation using (5.69) for some test cases in order to check thecorrectness of the implementation.5.3.4 The Jacobian matrixWe have seen how the derivatives of an error function with respect to the weightscan be obtained by the propagation of errors backwards through the network. Thetechnique of backpropagation can also be applied to the calculation of other derivatives. Here we consider the evaluation of the Jacobian matrix, whose elements aregiven by the derivatives of the network outputs with respect to the inputsJki ≡∂yk∂xi(5.70)where each such derivative is evaluated with all other inputs held ﬁxed.

Jacobianmatrices play a useful role in systems built from a number of distinct modules, asillustrated in Figure 5.8. Each module can comprise a ﬁxed or adaptive function,which can be linear or nonlinear, so long as it is differentiable. Suppose we wishto minimize an error function E with respect to the parameter w in Figure 5.8. Thederivative of the error function is given by ∂E ∂yk ∂zj∂E=∂w∂yk ∂zj ∂w(5.71)k,jin which the Jacobian matrix for the red module in Figure 5.8 appears in the middleterm.Because the Jacobian matrix provides a measure of the local sensitivity of theoutputs to changes in each of the input variables, it also allows any known errors ∆xi2485.

NEURAL NETWORKSassociated with the inputs to be propagated through the trained network in order toestimate their contribution ∆yk to the errors at the outputs, through the relation ∂yk∆xi(5.72)∆yk ∂xiiwhich is valid provided the |∆xi | are small. In general, the network mapping represented by a trained neural network will be nonlinear, and so the elements of theJacobian matrix will not be constants but will depend on the particular input vectorused. Thus (5.72) is valid only for small perturbations of the inputs, and the Jacobianitself must be re-evaluated for each new input vector.The Jacobian matrix can be evaluated using a backpropagation procedure that issimilar to the one derived earlier for evaluating the derivatives of an error functionwith respect to the weights.

We start by writing the element Jki in the form ∂yk ∂aj∂yk=Jki =∂xi∂aj ∂xij=jwji∂yk∂aj(5.73)where we have made use of (5.48). The sum in (5.73) runs over all units j to whichthe input unit i sends connections (for example, over all units in the ﬁrst hiddenlayer in the layered topology considered earlier). We now write down a recursivebackpropagation formula to determine the derivatives ∂yk /∂aj ∂yk ∂al∂yk=∂aj∂al ∂ajl∂yk= h (aj )wlj(5.74)∂allwhere the sum runs over all units l to which unit j sends connections (correspondingto the ﬁrst index of wlj ). Again, we have made use of (5.48) and (5.49). Thisbackpropagation starts at the output units for which the required derivatives can befound directly from the functional form of the output-unit activation function.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.