Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 43

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 43 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 432020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 43)

. . , N . Show that the maximum likelihoodsolution WML for the parameter matrix W has the property that each column isgiven by an expression of the form (3.15), which was the solution for an isotropicnoise distribution. Note that this is independent of the covariance matrix Σ. Showthat the maximum likelihood solution for Σ is given byNT1 TTtn − WMLφ(xn ) tn − WMLφ(xn ) .Σ=N(3.109)n=13.7 () By using the technique of completing the square, verify the result (3.49) for theposterior distribution of the parameters w in the linear basis function model in whichmN and SN are deﬁned by (3.50) and (3.51) respectively.3.8 ( ) www Consider the linear basis function model in Section 3.1, and supposethat we have already observed N data points, so that the posterior distribution overw is given by (3.49). This posterior can be regarded as the prior for the next observation.

By considering an additional data point (xN +1 , tN +1 ), and by completingthe square in the exponential, show that the resulting posterior distribution is againgiven by (3.49) but with SN replaced by SN +1 and mN replaced by mN +1 .3.9 ( ) Repeat the previous exercise but instead of completing the square by hand,make use of the general result for linear-Gaussian models given by (2.116).3.10 ( ) www By making use of the result (2.115) to evaluate the integral in (3.57),verify that the predictive distribution for the Bayesian linear regression model isgiven by (3.58) in which the input-dependent variance is given by (3.59).3.11 ( ) We have seen that, as the size of a data set increases, the uncertainty associatedwith the posterior distribution over model parameters decreases.

Make use of thematrix identity (Appendix C)(M−1 v) vT M−1T −1−1(3.110)M + vv=M −1 + vT M−1 v2(x) associated with the linear regression functionto show that the uncertainty σNgiven by (3.59) satisﬁes22σN(3.111)+1 (x) σN (x).3.12 ( ) We saw in Section 2.3.6 that the conjugate prior for a Gaussian distributionwith unknown mean and unknown precision (inverse variance) is a normal-gammadistribution.

This property also holds for the case of the conditional Gaussian distribution p(t|x, w, β) of the linear regression model. If we consider the likelihoodfunction (3.10), then the conjugate prior for w and β is given byp(w, β) = N (w|m0 , β −1 S0 )Gam(β|a0 , b0 ).(3.112)1763.

LINEAR MODELS FOR REGRESSIONShow that the corresponding posterior distribution takes the same functional form,so thatp(w, β|t) = N (w|mN , β −1 SN )Gam(β|aN , bN )(3.113)and ﬁnd expressions for the posterior parameters mN , SN , aN , and bN .3.13 ( ) Show that the predictive distribution p(t|x, t) for the model discussed in Exercise 3.12 is given by a Student’s t-distribution of the formp(t|x, t) = St(t|µ, λ, ν)(3.114)and obtain expressions for µ, λ and ν.3.14 ( ) In this exercise, we explore in more detail the properties of the equivalentkernel deﬁned by (3.62), where SN is deﬁned by (3.54).

Suppose that the basisfunctions φj (x) are linearly independent and that the number N of data points isgreater than the number M of basis functions. Furthermore, let one of the basisfunctions be constant, say φ0 (x) = 1. By taking suitable linear combinations ofthese basis functions, we can construct a new basis set ψj (x) spanning the samespace but that are orthonormal, so thatNψj (xn )ψk (xn ) = Ijk(3.115)n=1where Ijk is deﬁned to be 1 if j = k and 0 otherwise, and we take ψ0 (x) = 1. Showthat for α = 0, the equivalent kernel can be written as k(x, x ) = ψ(x)T ψ(x )where ψ = (ψ1 , . . . , ψM )T . Use this result to show that the kernel satisﬁes thesummation constraintNk(x, xn ) = 1.(3.116)n=13.15 () www Consider a linear basis function model for regression in which the parameters α and β are set using the evidence framework. Show that the functionE(mN ) deﬁned by (3.82) satisﬁes the relation 2E(mN ) = N .3.16 ( ) Derive the result (3.86) for the log evidence function p(t|α, β) of the linearregression model by making use of (2.115) to evaluate the integral (3.77) directly.3.17 () Show that the evidence function for the Bayesian linear regression model canbe written in the form (3.78) in which E(w) is deﬁned by (3.79).3.18 ( ) www By completing the square over w, show that the error function (3.79)in Bayesian linear regression can be written in the form (3.80).3.19 ( ) Show that the integration over w in the Bayesian linear regression model givesthe result (3.85).

Hence show that the log marginal likelihood is given by (3.86).Exercises1773.20 ( ) www Starting from (3.86) verify all of the steps needed to show that maximization of the log marginal likelihood function (3.86) with respect to α leads to there-estimation equation (3.92).3.21 ( ) An alternative way to derive the result (3.92) for the optimal value of α in theevidence framework is to make use of the identityd−1 dln |A| = Tr AA .(3.117)dαdαProve this identity by considering the eigenvalue expansion of a real, symmetricmatrix A, and making use of the standard results for the determinant and trace ofA expressed in terms of its eigenvalues (Appendix C).

Then make use of (3.117) toderive (3.92) starting from (3.86).3.22 ( ) Starting from (3.86) verify all of the steps needed to show that maximization of the log marginal likelihood function (3.86) with respect to β leads to there-estimation equation (3.95).3.23 ( ) www Show that the marginal probability of the data, in other words themodel evidence, for the model described in Exercise 3.12 is given byp(t) =ba0 0 Γ(aN ) |SN |1/21(2π)N/2 baNN Γ(a0 ) |S0 |1/2(3.118)by ﬁrst marginalizing with respect to w and then with respect to β.3.24 ( ) Repeat the previous exercise but now use Bayes’ theorem in the formp(t) =p(t|w, β)p(w, β)p(w, β|t)(3.119)and then substitute for the prior and posterior distributions and the likelihood function in order to derive the result (3.118).4LinearModels forClassiﬁcationIn the previous chapter, we explored a class of regression models having particularlysimple analytical and computational properties. We now discuss an analogous classof models for solving classiﬁcation problems.

The goal in classiﬁcation is to take aninput vector x and to assign it to one of K discrete classes Ck where k = 1, . . . , K.In the most common scenario, the classes are taken to be disjoint, so that each input isassigned to one and only one class. The input space is thereby divided into decisionregions whose boundaries are called decision boundaries or decision surfaces. Inthis chapter, we consider linear models for classiﬁcation, by which we mean that thedecision surfaces are linear functions of the input vector x and hence are deﬁnedby (D − 1)-dimensional hyperplanes within the D-dimensional input space. Datasets whose classes can be separated exactly by linear decision surfaces are said to belinearly separable.For regression problems, the target variable t was simply the vector of real numbers whose values we wish to predict.

In the case of classiﬁcation, there are various1791804. LINEAR MODELS FOR CLASSIFICATIONways of using target values to represent class labels. For probabilistic models, themost convenient, in the case of two-class problems, is the binary representation inwhich there is a single target variable t ∈ {0, 1} such that t = 1 represents class C1and t = 0 represents class C2 . We can interpret the value of t as the probability thatthe class is C1 , with the values of probability taking only the extreme values of 0 and1. For K > 2 classes, it is convenient to use a 1-of-K coding scheme in which t isa vector of length K such that if the class is Cj , then all elements tk of t are zeroexcept element tj , which takes the value 1.

For instance, if we have K = 5 classes,then a pattern from class 2 would be given the target vectort = (0, 1, 0, 0, 0)T .(4.1)Again, we can interpret the value of tk as the probability that the class is Ck . Fornonprobabilistic models, alternative choices of target variable representation willsometimes prove convenient.In Chapter 1, we identiﬁed three distinct approaches to the classiﬁcation problem. The simplest involves constructing a discriminant function that directly assignseach vector x to a speciﬁc class.

A more powerful approach, however, models theconditional probability distribution p(Ck |x) in an inference stage, and then subsequently uses this distribution to make optimal decisions. By separating inferenceand decision, we gain numerous beneﬁts, as discussed in Section 1.5.4. There aretwo different approaches to determining the conditional probabilities p(Ck |x).

Onetechnique is to model them directly, for example by representing them as parametricmodels and then optimizing the parameters using a training set. Alternatively, wecan adopt a generative approach in which we model the class-conditional densitiesgiven by p(x|Ck ), together with the prior probabilities p(Ck ) for the classes, and thenwe compute the required posterior probabilities using Bayes’ theoremp(Ck |x) =p(x|Ck )p(Ck ).p(x)(4.2)We shall discuss examples of all three approaches in this chapter.In the linear regression models considered in Chapter 3, the model predictiony(x, w) was given by a linear function of the parameters w.

In the simplest case,the model is also linear in the input variables and therefore takes the form y(x) =wT x + w0 , so that y is a real number. For classiﬁcation problems, however, we wishto predict discrete class labels, or more generally posterior probabilities that lie inthe range (0, 1). To achieve this, we consider a generalization of this model in whichwe transform the linear function of w using a nonlinear function f ( · ) so that(4.3)y(x) = f wT x + w0 .In the machine learning literature f ( · ) is known as an activation function, whereasits inverse is called a link function in the statistics literature. The decision surfacescorrespond to y(x) = constant, so that wT x + w0 = constant and hence the decision surfaces are linear functions of x, even if the function f (·) is nonlinear. For thisreason, the class of models described by (4.3) are called generalized linear models4.1.

Discriminant Functions181(McCullagh and Nelder, 1989). Note, however, that in contrast to the models usedfor regression, they are no longer linear in the parameters due to the presence of thenonlinear function f (·). This will lead to more complex analytical and computational properties than for linear regression models. Nevertheless, these models arestill relatively simple compared to the more general nonlinear models that will bestudied in subsequent chapters.The algorithms discussed in this chapter will be equally applicable if we ﬁrstmake a ﬁxed nonlinear transformation of the input variables using a vector of basisfunctions φ(x) as we did for regression models in Chapter 3. We begin by considering classiﬁcation directly in the original input space x, while in Section 4.3 we shallﬁnd it convenient to switch to a notation involving basis functions for consistencywith later chapters.4.1. Discriminant FunctionsA discriminant is a function that takes an input vector x and assigns it to one of Kclasses, denoted Ck .

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.