Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 53

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 53 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 532020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 53)

Suppose we are given a training data set {φn , tn }where n = 1, . . . , N , and tn is a binary target vector of length K that uses the 1-ofK coding scheme, so that it has components tnj = Ijk if pattern n is from class Ck .Assuming that the data points are drawn independently from this model, show thatthe maximum-likelihood solution for the prior probabilities is given byπk =NkN(4.159)where Nk is the number of data points assigned to class Ck .4.10 ( ) Consider the classiﬁcation model of Exercise 4.9 and now suppose that theclass-conditional densities are given by Gaussian distributions with a shared covariance matrix, so thatp(φ|Ck ) = N (φ|µk , Σ).(4.160)Show that the maximum likelihood solution for the mean of the Gaussian distributionfor class Ck is given byN1 tnk φn(4.161)µk =Nkn=12224. LINEAR MODELS FOR CLASSIFICATIONwhich represents the mean of those feature vectors assigned to class Ck .

Similarly,show that the maximum likelihood solution for the shared covariance matrix is givenbyKNkΣ=(4.162)SkNk=1whereSk =N1 tnk (φn − µk )(φn − µk )T .Nk(4.163)n=1Thus Σ is given by a weighted average of the covariances of the data associated witheach class, in which the weighting coefﬁcients are given by the prior probabilities ofthe classes.4.11 ( ) Consider a classiﬁcation problem with K classes for which the feature vectorφ has M components each of which can take L discrete states.

Let the values of thecomponents be represented by a 1-of-L binary coding scheme. Further suppose that,conditioned on the class Ck , the M components of φ are independent, so that theclass-conditional density factorizes with respect to the feature vector components.Show that the quantities ak given by (4.63), which appear in the argument to thesoftmax function describing the posterior class probabilities, are linear functions ofthe components of φ. Note that this represents an example of the naive Bayes modelwhich is discussed in Section 8.2.2.4.12 () www Verify the relation (4.88) for the derivative of the logistic sigmoid function deﬁned by (4.59).4.13 () www By making use of the result (4.88) for the derivative of the logistic sigmoid, show that the derivative of the error function (4.90) for the logistic regressionmodel is given by (4.91).4.14 () Show that for a linearly separable data set, the maximum likelihood solutionfor the logistic regression model is obtained by ﬁnding a vector w whose decisionboundary wT φ(x) = 0 separates the classes and then taking the magnitude of w toinﬁnity.4.15 ( ) Show that the Hessian matrix H for the logistic regression model, given by(4.97), is positive deﬁnite.

Here R is a diagonal matrix with elements yn (1 − yn ),and yn is the output of the logistic regression model for input vector xn . Hence showthat the error function is a concave function of w and that it has a unique minimum.4.16 () Consider a binary classiﬁcation problem in which each observation xn is knownto belong to one of two classes, corresponding to t = 0 and t = 1, and suppose thatthe procedure for collecting training data is imperfect, so that training points aresometimes mislabelled. For every data point xn , instead of having a value t for theclass label, we have instead a value πn representing the probability that tn = 1.Given a probabilistic model p(t = 1|φ), write down the log likelihood functionappropriate to such a data set.Exercises2234.17 () www Show that the derivatives of the softmax activation function (4.104),where the ak are deﬁned by (4.105), are given by (4.106).4.18 () Using the result (4.91) for the derivatives of the softmax activation function,show that the gradients of the cross-entropy error (4.108) are given by (4.109).4.19 () www Write down expressions for the gradient of the log likelihood, as wellas the corresponding Hessian matrix, for the probit regression model deﬁned in Section 4.3.5.

These are the quantities that would be required to train such a model usingIRLS.4.20 ( ) Show that the Hessian matrix for the multiclass logistic regression problem,deﬁned by (4.110), is positive semideﬁnite. Note that the full Hessian matrix forthis problem is of size M K × M K, where M is the number of parameters and Kis the number of classes. To prove the positive semideﬁnite property, consider theproduct uT Hu where u is an arbitrary vector of length M K, and then apply Jensen’sinequality.4.21 () Show that the probit function (4.114) and the erf function (4.115) are related by(4.116).4.22 () Using the result (4.135), derive the expression (4.137) for the log model evidence under the Laplace approximation.4.23 ( ) www In this exercise, we derive the BIC result (4.139) starting from theLaplace approximation to the model evidence given by (4.137).

Show that if theprior over parameters is Gaussian of the form p(θ) = N (θ|m, V0 ), the log modelevidence under the Laplace approximation takes the form11ln p(D) ln p(D|θ MAP ) − (θ MAP − m)T V0−1 (θ MAP − m) − ln |H| + const22where H is the matrix of second derivatives of the log likelihood ln p(D|θ) evaluatedat θ MAP . Now assume that the prior is broad so that V0−1 is small and the secondterm on the right-hand side above can be neglected. Furthermore, consider the caseof independent, identically distributed data so that H is the sum of terms one for eachdata point. Show that the log model evidence can then be written approximately inthe form of the BIC expression (4.139).4.24 ( ) Use the results from Section 2.3.2 to derive the result (4.151) for the marginalization of the logistic regression model with respect to a Gaussian posterior distribution over the parameters w.4.25 ( ) Suppose we wish to approximate the logistic sigmoid σ(a) deﬁned by (4.59)by a scaled probit function Φ(λa), where Φ(a) is deﬁned by (4.114).

Show that ifλ is chosen so that the derivatives of the two functions are equal at a = 0, thenλ2 = π/8.2244. LINEAR MODELS FOR CLASSIFICATION4.26 ( ) In this exercise, we prove the relation (4.152) for the convolution of a probitfunction with a Gaussian distribution.

To do this, show that the derivative of the lefthand side with respect to µ is equal to the derivative of the right-hand side, and thenintegrate both sides with respect to µ and then show that the constant of integrationvanishes. Note that before differentiating the left-hand side, it is convenient ﬁrstto introduce a change of variable given by a = µ + σz so that the integral over ais replaced by an integral over z. When we differentiate the left-hand side of therelation (4.152), we will then obtain a Gaussian integral over z that can be evaluatedanalytically.5NeuralNetworksIn Chapters 3 and 4 we considered models for regression and classiﬁcation that comprised linear combinations of ﬁxed basis functions. We saw that such models haveuseful analytical and computational properties but that their practical applicabilitywas limited by the curse of dimensionality.

In order to apply such models to largescale problems, it is necessary to adapt the basis functions to the data.Support vector machines (SVMs), discussed in Chapter 7, address this by ﬁrstdeﬁning basis functions that are centred on the training data points and then selectinga subset of these during training. One advantage of SVMs is that, although thetraining involves nonlinear optimization, the objective function is convex, and so thesolution of the optimization problem is relatively straightforward.

The number ofbasis functions in the resulting models is generally much smaller than the number oftraining points, although it is often still relatively large and typically increases withthe size of the training set. The relevance vector machine, discussed in Section 7.2,also chooses a subset from a ﬁxed set of basis functions and typically results in much2252265. NEURAL NETWORKSsparser models. Unlike the SVM it also produces probabilistic outputs, although thisis at the expense of a nonconvex optimization during training.An alternative approach is to ﬁx the number of basis functions in advance butallow them to be adaptive, in other words to use parametric forms for the basis functions in which the parameter values are adapted during training.

The most successfulmodel of this type in the context of pattern recognition is the feed-forward neuralnetwork, also known as the multilayer perceptron, discussed in this chapter. In fact,‘multilayer perceptron’ is really a misnomer, because the model comprises multiple layers of logistic regression models (with continuous nonlinearities) rather thanmultiple perceptrons (with discontinuous nonlinearities). For many applications, theresulting model can be signiﬁcantly more compact, and hence faster to evaluate, thana support vector machine having the same generalization performance.

The price tobe paid for this compactness, as with the relevance vector machine, is that the likelihood function, which forms the basis for network training, is no longer a convexfunction of the model parameters. In practice, however, it is often worth investingsubstantial computational resources during the training phase in order to obtain acompact model that is fast at processing new data.The term ‘neural network’ has its origins in attempts to ﬁnd mathematical representations of information processing in biological systems (McCulloch and Pitts,1943; Widrow and Hoff, 1960; Rosenblatt, 1962; Rumelhart et al., 1986). Indeed,it has been used very broadly to cover a wide range of different models, many ofwhich have been the subject of exaggerated claims regarding their biological plausibility. From the perspective of practical applications of pattern recognition, however, biological realism would impose entirely unnecessary constraints.

Our focus inthis chapter is therefore on neural networks as efﬁcient models for statistical patternrecognition. In particular, we shall restrict our attention to the speciﬁc class of neural networks that have proven to be of greatest practical value, namely the multilayerperceptron.We begin by considering the functional form of the network model, includingthe speciﬁc parameterization of the basis functions, and we then discuss the problem of determining the network parameters within a maximum likelihood framework, which involves the solution of a nonlinear optimization problem. This requiresthe evaluation of derivatives of the log likelihood function with respect to the network parameters, and we shall see how these can be obtained efﬁciently using thetechnique of error backpropagation.

We shall also show how the backpropagationframework can be extended to allow other derivatives to be evaluated, such as theJacobian and Hessian matrices. Next we discuss various approaches to regularization of neural network training and the relationships between them. We also considersome extensions to the neural network model, and in particular we describe a general framework for modelling conditional probability distributions known as mixturedensity networks. Finally, we discuss the use of Bayesian treatments of neural networks. Additional background on neural network models can be found in Bishop(1995a).5.1. Feed-forward Network Functions2275.1.

Feed-forward Network FunctionsThe linear models for regression and classiﬁcation discussed in Chapters 3 and 4, respectively, are based on linear combinations of ﬁxed nonlinear basis functions φj (x)and take the formMwj φj (x)(5.1)y(x, w) = fj =1where f (·) is a nonlinear activation function in the case of classiﬁcation and is theidentity in the case of regression.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.