Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 85

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 85 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 852020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 85)

We can represent this problem using a graphical modelof the form show in Figure 8.8.The graphical model captures the causal process (Pearl, 1988) by which the observed data was generated. For this reason, such models are often called generativemodels. By contrast, the polynomial regression model described by Figure 8.5 isnot generative because there is no probability distribution associated with the inputvariable x, and so it is not possible to generate synthetic data points from this model.We could make it generative by introducing a suitable prior distribution p(x), at theexpense of a more complex model.The hidden variables in a probabilistic model need not, however, have any explicit physical interpretation but may be introduced simply to allow a more complexjoint distribution to be constructed from simpler components.

In either case, thetechnique of ancestral sampling applied to a generative model mimics the creationof the observed data and would therefore give rise to ‘fantasy’ data whose probabilitydistribution (if the model were a perfect representation of reality) would be the sameas that of the observed data.

In practice, producing synthetic observations from agenerative model can prove informative in understanding the form of the probabilitydistribution represented by that model.8.1.3 Discrete variablesSection 2.4We have discussed the importance of probability distributions that are membersof the exponential family, and we have seen that this family includes many wellknown distributions as particular cases. Although such distributions are relativelysimple, they form useful building blocks for constructing more complex probability3678.1. Bayesian NetworksFigure 8.9x1(a) This fully-connected graph describes a general distribution over two K-state discrete variables having a total of (a)K 2 − 1 parameters. (b) By dropping the link between thenodes, the number of parameters is reduced to 2(K − 1).x1(b)x2x2distributions, and the framework of graphical models is very useful in expressing theway in which these building blocks are linked together.Such models have particularly nice properties if we choose the relationship between each parent-child pair in a directed graph to be conjugate, and we shall explore several examples of this shortly.

Two cases are particularly worthy of note,namely when the parent and child node each correspond to discrete variables andwhen they each correspond to Gaussian variables, because in these two cases therelationship can be extended hierarchically to construct arbitrarily complex directedacyclic graphs. We begin by examining the discrete case.The probability distribution p(x|µ) for a single discrete variable x having Kpossible states (using the 1-of-K representation) is given byp(x|µ) =Kµxkk(8.9)k=1Tand is governed by the parameters µ = (µ1 , .

. . , µK ) . Due to the constraintk µk = 1, only K − 1 values for µk need to be speciﬁed in order to deﬁne thedistribution.Now suppose that we have two discrete variables, x1 and x2 , each of which hasK states, and we wish to model their joint distribution. We denote the probability ofobserving both x1k = 1 and x2l = 1 by the parameter µkl , where x1k denotes thek th component of x1 , and similarly for x2l . The joint distribution can be writtenp(x1 , x2 |µ) =K Kk=1 l=1µxkl1k x2l . Because the parameters µkl are subject to the constraint k l µkl = 1, this distribution is governed by K 2 − 1 parameters. It is easily seen that the total number ofparameters that must be speciﬁed for an arbitrary joint distribution over M variablesis K M − 1 and therefore grows exponentially with the number M of variables.Using the product rule, we can factor the joint distribution p(x1 , x2 ) in the formp(x2 |x1 )p(x1 ), which corresponds to a two-node graph with a link going from thex1 node to the x2 node as shown in Figure 8.9(a).

The marginal distribution p(x1 )is governed by K − 1 parameters, as before, Similarly, the conditional distributionp(x2 |x1 ) requires the speciﬁcation of K − 1 parameters for each of the K possiblevalues of x1 . The total number of parameters that must be speciﬁed in the jointdistribution is therefore (K − 1) + K(K − 1) = K 2 − 1 as before.Now suppose that the variables x1 and x2 were independent, corresponding tothe graphical model shown in Figure 8.9(b). Each variable is then described by3688. GRAPHICAL MODELSFigure 8.10 This chain of M discrete nodes, each x1having K states, requires the speciﬁcation of K − 1 +(M − 1)K(K − 1) parameters, which grows linearlywith the length M of the chain.

In contrast, a fully connected graph of M nodes would have K M − 1 parameters, which grows exponentially with M .x2xMa separate multinomial distribution, and the total number of parameters would be2(K − 1). For a distribution over M independent discrete variables, each having Kstates, the total number of parameters would be M (K − 1), which therefore growslinearly with the number of variables. From a graphical perspective, we have reducedthe number of parameters by dropping links in the graph, at the expense of having arestricted class of distributions.More generally, if we have M discrete variables x1 , . . .

, xM , we can modelthe joint distribution using a directed graph with one variable corresponding to eachnode. The conditional distribution at each node is given by a set of nonnegative parameters subject to the usual normalization constraint. If the graph is fully connectedthen we have a completely general distribution having K M − 1 parameters, whereasif there are no links in the graph the joint distribution factorizes into the product ofthe marginals, and the total number of parameters is M (K − 1). Graphs having intermediate levels of connectivity allow for more general distributions than the fullyfactorized one while requiring fewer parameters than the general joint distribution.As an illustration, consider the chain of nodes shown in Figure 8.10.

The marginaldistribution p(x1 ) requires K − 1 parameters, whereas each of the M − 1 conditional distributions p(xi |xi−1 ), for i = 2, . . . , M , requires K(K − 1) parameters.This gives a total parameter count of K − 1 + (M − 1)K(K − 1), which is quadraticin K and which grows linearly (rather than exponentially) with the length M of thechain.An alternative way to reduce the number of independent parameters in a modelis by sharing parameters (also known as tying of parameters).

For instance, in thechain example of Figure 8.10, we can arrange that all of the conditional distributionsp(xi |xi−1 ), for i = 2, . . . , M , are governed by the same set of K(K −1) parameters.Together with the K −1 parameters governing the distribution of x1 , this gives a totalof K 2 − 1 parameters that must be speciﬁed in order to deﬁne the joint distribution.We can turn a graph over discrete variables into a Bayesian model by introducing Dirichlet priors for the parameters. From a graphical point of view, each nodethen acquires an additional parent representing the Dirichlet distribution over the parameters associated with the corresponding discrete node. This is illustrated for thechain model in Figure 8.11.

The corresponding model in which we tie the parameters governing the conditional distributions p(xi |xi−1 ), for i = 2, . . . , M , is shownin Figure 8.12.Another way of controlling the exponential growth in the number of parametersin models of discrete variables is to use parameterized models for the conditionaldistributions instead of complete tables of conditional probability values.

To illustrate this idea, consider the graph in Figure 8.13 in which all of the nodes representbinary variables. Each of the parent variables xi is governed by a single parame-8.1. Bayesian NetworksFigure 8.11Figure 8.12An extension of the model of µ1Figure 8.10 to include Dirichlet priors over the parameters governing the discretedistributions.µ2µMx1x2xMAs in Figure 8.11 but with a sin- µ1gle set of parameters µ sharedamongst all of the conditionaldistributions p(xi |xi−1 ).x1Section 2.4369µx2xMter µi representing the probability p(xi = 1), giving M parameters in total for theparent nodes. The conditional distribution p(y|x1 , . . .

, xM ), however, would require2M parameters representing the probability p(y = 1) for each of the 2M possiblesettings of the parent variables. Thus in general the number of parameters requiredto specify this conditional distribution will grow exponentially with M . We can obtain a more parsimonious form for the conditional distribution by using a logisticsigmoid function acting on a linear combination of the parent variables, givingMwi xi = σ(wT x)(8.10)p(y = 1|x1 , . . . , xM ) = σ w0 +i=1where σ(a) = (1+exp(−a))−1 is the logistic sigmoid, x = (x0 , x1 , .

. . , xM )T is an(M + 1)-dimensional vector of parent states augmented with an additional variablex0 whose value is clamped to 1, and w = (w0 , w1 , . . . , wM )T is a vector of M + 1parameters. This is a more restricted form of conditional distribution than the generalcase but is now governed by a number of parameters that grows linearly with M . Inthis sense, it is analogous to the choice of a restrictive form of covariance matrix (forexample, a diagonal matrix) in a multivariate Gaussian distribution. The motivationfor the logistic sigmoid representation was discussed in Section 4.2.Figure 8.13A graph comprising M parents x1 , .

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.