Главная » Просмотр файлов » Bishop C.M. Pattern Recognition and Machine Learning (2006)

Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 53

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 53 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 532020-08-25СтудИзба
Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Текст из файла (страница 53)

Suppose we are given a training data set {φn , tn }where n = 1, . . . , N , and tn is a binary target vector of length K that uses the 1-ofK coding scheme, so that it has components tnj = Ijk if pattern n is from class Ck .Assuming that the data points are drawn independently from this model, show thatthe maximum-likelihood solution for the prior probabilities is given byπk =NkN(4.159)where Nk is the number of data points assigned to class Ck .4.10 ( ) Consider the classification model of Exercise 4.9 and now suppose that theclass-conditional densities are given by Gaussian distributions with a shared covariance matrix, so thatp(φ|Ck ) = N (φ|µk , Σ).(4.160)Show that the maximum likelihood solution for the mean of the Gaussian distributionfor class Ck is given byN1 tnk φn(4.161)µk =Nkn=12224. LINEAR MODELS FOR CLASSIFICATIONwhich represents the mean of those feature vectors assigned to class Ck .

Similarly,show that the maximum likelihood solution for the shared covariance matrix is givenbyKNkΣ=(4.162)SkNk=1whereSk =N1 tnk (φn − µk )(φn − µk )T .Nk(4.163)n=1Thus Σ is given by a weighted average of the covariances of the data associated witheach class, in which the weighting coefficients are given by the prior probabilities ofthe classes.4.11 ( ) Consider a classification problem with K classes for which the feature vectorφ has M components each of which can take L discrete states.

Let the values of thecomponents be represented by a 1-of-L binary coding scheme. Further suppose that,conditioned on the class Ck , the M components of φ are independent, so that theclass-conditional density factorizes with respect to the feature vector components.Show that the quantities ak given by (4.63), which appear in the argument to thesoftmax function describing the posterior class probabilities, are linear functions ofthe components of φ. Note that this represents an example of the naive Bayes modelwhich is discussed in Section 8.2.2.4.12 () www Verify the relation (4.88) for the derivative of the logistic sigmoid function defined by (4.59).4.13 () www By making use of the result (4.88) for the derivative of the logistic sigmoid, show that the derivative of the error function (4.90) for the logistic regressionmodel is given by (4.91).4.14 () Show that for a linearly separable data set, the maximum likelihood solutionfor the logistic regression model is obtained by finding a vector w whose decisionboundary wT φ(x) = 0 separates the classes and then taking the magnitude of w toinfinity.4.15 ( ) Show that the Hessian matrix H for the logistic regression model, given by(4.97), is positive definite.

Here R is a diagonal matrix with elements yn (1 − yn ),and yn is the output of the logistic regression model for input vector xn . Hence showthat the error function is a concave function of w and that it has a unique minimum.4.16 () Consider a binary classification problem in which each observation xn is knownto belong to one of two classes, corresponding to t = 0 and t = 1, and suppose thatthe procedure for collecting training data is imperfect, so that training points aresometimes mislabelled. For every data point xn , instead of having a value t for theclass label, we have instead a value πn representing the probability that tn = 1.Given a probabilistic model p(t = 1|φ), write down the log likelihood functionappropriate to such a data set.Exercises2234.17 () www Show that the derivatives of the softmax activation function (4.104),where the ak are defined by (4.105), are given by (4.106).4.18 () Using the result (4.91) for the derivatives of the softmax activation function,show that the gradients of the cross-entropy error (4.108) are given by (4.109).4.19 () www Write down expressions for the gradient of the log likelihood, as wellas the corresponding Hessian matrix, for the probit regression model defined in Section 4.3.5.

These are the quantities that would be required to train such a model usingIRLS.4.20 ( ) Show that the Hessian matrix for the multiclass logistic regression problem,defined by (4.110), is positive semidefinite. Note that the full Hessian matrix forthis problem is of size M K × M K, where M is the number of parameters and Kis the number of classes. To prove the positive semidefinite property, consider theproduct uT Hu where u is an arbitrary vector of length M K, and then apply Jensen’sinequality.4.21 () Show that the probit function (4.114) and the erf function (4.115) are related by(4.116).4.22 () Using the result (4.135), derive the expression (4.137) for the log model evidence under the Laplace approximation.4.23 ( ) www In this exercise, we derive the BIC result (4.139) starting from theLaplace approximation to the model evidence given by (4.137).

Show that if theprior over parameters is Gaussian of the form p(θ) = N (θ|m, V0 ), the log modelevidence under the Laplace approximation takes the form11ln p(D) ln p(D|θ MAP ) − (θ MAP − m)T V0−1 (θ MAP − m) − ln |H| + const22where H is the matrix of second derivatives of the log likelihood ln p(D|θ) evaluatedat θ MAP . Now assume that the prior is broad so that V0−1 is small and the secondterm on the right-hand side above can be neglected. Furthermore, consider the caseof independent, identically distributed data so that H is the sum of terms one for eachdata point. Show that the log model evidence can then be written approximately inthe form of the BIC expression (4.139).4.24 ( ) Use the results from Section 2.3.2 to derive the result (4.151) for the marginalization of the logistic regression model with respect to a Gaussian posterior distribution over the parameters w.4.25 ( ) Suppose we wish to approximate the logistic sigmoid σ(a) defined by (4.59)by a scaled probit function Φ(λa), where Φ(a) is defined by (4.114).

Show that ifλ is chosen so that the derivatives of the two functions are equal at a = 0, thenλ2 = π/8.2244. LINEAR MODELS FOR CLASSIFICATION4.26 ( ) In this exercise, we prove the relation (4.152) for the convolution of a probitfunction with a Gaussian distribution.

To do this, show that the derivative of the lefthand side with respect to µ is equal to the derivative of the right-hand side, and thenintegrate both sides with respect to µ and then show that the constant of integrationvanishes. Note that before differentiating the left-hand side, it is convenient firstto introduce a change of variable given by a = µ + σz so that the integral over ais replaced by an integral over z. When we differentiate the left-hand side of therelation (4.152), we will then obtain a Gaussian integral over z that can be evaluatedanalytically.5NeuralNetworksIn Chapters 3 and 4 we considered models for regression and classification that comprised linear combinations of fixed basis functions. We saw that such models haveuseful analytical and computational properties but that their practical applicabilitywas limited by the curse of dimensionality.

In order to apply such models to largescale problems, it is necessary to adapt the basis functions to the data.Support vector machines (SVMs), discussed in Chapter 7, address this by firstdefining basis functions that are centred on the training data points and then selectinga subset of these during training. One advantage of SVMs is that, although thetraining involves nonlinear optimization, the objective function is convex, and so thesolution of the optimization problem is relatively straightforward.

The number ofbasis functions in the resulting models is generally much smaller than the number oftraining points, although it is often still relatively large and typically increases withthe size of the training set. The relevance vector machine, discussed in Section 7.2,also chooses a subset from a fixed set of basis functions and typically results in much2252265. NEURAL NETWORKSsparser models. Unlike the SVM it also produces probabilistic outputs, although thisis at the expense of a nonconvex optimization during training.An alternative approach is to fix the number of basis functions in advance butallow them to be adaptive, in other words to use parametric forms for the basis functions in which the parameter values are adapted during training.

The most successfulmodel of this type in the context of pattern recognition is the feed-forward neuralnetwork, also known as the multilayer perceptron, discussed in this chapter. In fact,‘multilayer perceptron’ is really a misnomer, because the model comprises multiple layers of logistic regression models (with continuous nonlinearities) rather thanmultiple perceptrons (with discontinuous nonlinearities). For many applications, theresulting model can be significantly more compact, and hence faster to evaluate, thana support vector machine having the same generalization performance.

The price tobe paid for this compactness, as with the relevance vector machine, is that the likelihood function, which forms the basis for network training, is no longer a convexfunction of the model parameters. In practice, however, it is often worth investingsubstantial computational resources during the training phase in order to obtain acompact model that is fast at processing new data.The term ‘neural network’ has its origins in attempts to find mathematical representations of information processing in biological systems (McCulloch and Pitts,1943; Widrow and Hoff, 1960; Rosenblatt, 1962; Rumelhart et al., 1986). Indeed,it has been used very broadly to cover a wide range of different models, many ofwhich have been the subject of exaggerated claims regarding their biological plausibility. From the perspective of practical applications of pattern recognition, however, biological realism would impose entirely unnecessary constraints.

Our focus inthis chapter is therefore on neural networks as efficient models for statistical patternrecognition. In particular, we shall restrict our attention to the specific class of neural networks that have proven to be of greatest practical value, namely the multilayerperceptron.We begin by considering the functional form of the network model, includingthe specific parameterization of the basis functions, and we then discuss the problem of determining the network parameters within a maximum likelihood framework, which involves the solution of a nonlinear optimization problem. This requiresthe evaluation of derivatives of the log likelihood function with respect to the network parameters, and we shall see how these can be obtained efficiently using thetechnique of error backpropagation.

We shall also show how the backpropagationframework can be extended to allow other derivatives to be evaluated, such as theJacobian and Hessian matrices. Next we discuss various approaches to regularization of neural network training and the relationships between them. We also considersome extensions to the neural network model, and in particular we describe a general framework for modelling conditional probability distributions known as mixturedensity networks. Finally, we discuss the use of Bayesian treatments of neural networks. Additional background on neural network models can be found in Bishop(1995a).5.1. Feed-forward Network Functions2275.1.

Feed-forward Network FunctionsThe linear models for regression and classification discussed in Chapters 3 and 4, respectively, are based on linear combinations of fixed nonlinear basis functions φj (x)and take the formMwj φj (x)(5.1)y(x, w) = fj =1where f (·) is a nonlinear activation function in the case of classification and is theidentity in the case of regression.

Характеристики

Тип файла
PDF-файл
Размер
9,37 Mb
Тип материала
Высшее учебное заведение

Список файлов книги

Свежие статьи
Популярно сейчас
Как Вы думаете, сколько людей до Вас делали точно такое же задание? 99% студентов выполняют точно такие же задания, как и их предшественники год назад. Найдите нужный учебный материал на СтудИзбе!
Ответы на популярные вопросы
Да! Наши авторы собирают и выкладывают те работы, которые сдаются в Вашем учебном заведении ежегодно и уже проверены преподавателями.
Да! У нас любой человек может выложить любую учебную работу и зарабатывать на её продажах! Но каждый учебный материал публикуется только после тщательной проверки администрацией.
Вернём деньги! А если быть более точными, то автору даётся немного времени на исправление, а если не исправит или выйдет время, то вернём деньги в полном объёме!
Да! На равне с готовыми студенческими работами у нас продаются услуги. Цены на услуги видны сразу, то есть Вам нужно только указать параметры и сразу можно оплачивать.
Отзывы студентов
Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.
Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.
Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.
Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.
Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.
Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.
Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.
Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.
Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.
Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.
Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.
Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.
Популярные преподаватели
Добавляйте материалы
и зарабатывайте!
Продажи идут автоматически
6543
Авторов
на СтудИзбе
300
Средний доход
с одного платного файла
Обучение Подробнее