Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 88

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 88 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 882020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 88)

Any suchpath is said to be blocked if it includes a node such that either(a) the arrows on the path meet either head-to-tail or tail-to-tail at the node, and thenode is in the set C, or(b) the arrows meet head-to-head at the node, and neither the node, nor any of itsdescendants, is in the set C.If all paths are blocked, then A is said to be d-separated from B by C, and the jointdistribution over all of the variables in the graph will satisfy A ⊥⊥ B | C.The concept of d-separation is illustrated in Figure 8.22. In graph (a), the pathfrom a to b is not blocked by node f because it is a tail-to-tail node for this pathand is not observed, nor is it blocked by node e because, although the latter is ahead-to-head node, it has a descendant c because is in the conditioning set. Thusthe conditional independence statement a ⊥⊥ b | c does not follow from this graph.In graph (b), the path from a to b is blocked by node f because this is a tail-to-tailnode that is observed, and so the conditional independence property a ⊥⊥ b | f will3798.2.

Conditional IndependenceFigure 8.22 Illustration of the concept of d-separation. See the text fordetails.faefaebcbc(a)Section 2.3(b)be satisﬁed by any distribution that factorizes according to this graph. Note that thispath is also blocked by node e because e is a head-to-head node and neither it nor itsdescendant are in the conditioning set.For the purposes of d-separation, parameters such as α and σ 2 in Figure 8.5,indicated by small ﬁlled circles, behave in the same was as observed nodes. However, there are no marginal distributions associated with such nodes.

Consequentlyparameter nodes never themselves have parents and so all paths through these nodeswill always be tail-to-tail and hence blocked. Consequently they play no role ind-separation.Another example of conditional independence and d-separation is provided bythe concept of i.i.d. (independent identically distributed) data introduced in Section 1.2.4. Consider the problem of ﬁnding the posterior distribution for the meanof a univariate Gaussian distribution.

This can be represented by the directed graphshown in Figure 8.23 in which the joint distribution is deﬁned by a prior p(µ) together with a set of conditional distributions p(xn |µ) for n = 1, . . . , N . In practice,we observe D = {x1 , . . . , xN } and our goal is to infer µ. Suppose, for a moment,that we condition on µ and consider the joint distribution of the observations.

Usingd-separation, we note that there is a unique path from any xi to any other xj=i andthat this path is tail-to-tail with respect to the observed node µ. Every such path isblocked and so the observations D = {x1 , . . . , xN } are independent given µ, so thatp(D|µ) =Np(xn |µ).(8.34)n=1Figure 8.23(a) Directed graph corresponding to the problemof inferring the mean µ ofa univariate Gaussian distribution from observationsx1 , . . .

, xN . (b) The samegraph drawn using the platenotation.µµNxNx1(a)xnN(b)3808. GRAPHICAL MODELSFigure 8.24A graphical representation of the ‘naive Bayes’model for classiﬁcation.Conditioned on theclass label z, the components of the observedvector x = (x1 , . . . , xD )T are assumed to beindependent.zx1xDHowever, if we integrate over µ, the observations are in general no longer independent ∞Np(D|µ)p(µ) dµ =p(xn ).(8.35)p(D) =0n=1Here µ is a latent variable, because its value is not observed.Another example of a model representing i.i.d. data is the graph in Figure 8.7corresponding to Bayesian polynomial regression.

Here the stochastic nodes correspond to {tn }, w and t. We see that the node for w is tail-to-tail with respect tothe path from t to any one of the nodes tn and so we have the following conditionalindependence propertyt ⊥⊥ tn | w.(8.36)Section 3.3Thus, conditioned on the polynomial coefﬁcients w, the predictive distribution fort is independent of the training data {t1 , .

. . , tN }. We can therefore ﬁrst use thetraining data to determine the posterior distribution over the coefﬁcients w and thenwe can discard the training data and use the posterior distribution for w to makex.predictions of t for new input observations A related graphical structure arises in an approach to classiﬁcation called thenaive Bayes model, in which we use conditional independence assumptions to simplify the model structure. Suppose our observed variable consists of a D-dimensionalvector x = (x1 , .

. . , xD )T , and we wish to assign observed values of x to one of Kclasses. Using the 1-of-K encoding scheme, we can represent these classes by a Kdimensional binary vector z. We can then deﬁne a generative model by introducinga multinomial prior p(z|µ) over the class labels, where the k th component µk of µis the prior probability of class Ck , together with a conditional distribution p(x|z)for the observed vector x. The key assumption of the naive Bayes model is that,conditioned on the class z, the distributions of the input variables x1 , . .

. , xD are independent. The graphical representation of this model is shown in Figure 8.24. Wesee that observation of z blocks the path between xi and xj for j = i (because suchpaths are tail-to-tail at the node z) and so xi and xj are conditionally independentgiven z. If, however, we marginalize out z (so that z is unobserved) the tail-to-tailpath from xi to xj is no longer blocked. This tells us that in general the marginaldensity p(x) will not factorize with respect to the components of x.

We encountereda simple application of the naive Bayes model in the context of fusing data fromdifferent sources for medical diagnosis in Section 1.5.If we are given a labelled training set, comprising inputs {x1 , . . . , xN } togetherwith their class labels, then we can ﬁt the naive Bayes model to the training data8.2. Conditional Independence381using maximum likelihood assuming that the data are drawn independently fromthe model. The solution is obtained by ﬁtting the model for each class separatelyusing the correspondingly labelled data.

As an example, suppose that the probabilitydensity within each class is chosen to be Gaussian. In this case, the naive Bayesassumption then implies that the covariance matrix for each Gaussian is diagonal,and the contours of constant density within each class will be axis-aligned ellipsoids.The marginal density, however, is given by a superposition of diagonal Gaussians(with weighting coefﬁcients given by the class priors) and so will no longer factorizewith respect to its components.The naive Bayes assumption is helpful when the dimensionality D of the inputspace is high, making density estimation in the full D-dimensional space more challenging. It is also useful if the input vector contains both discrete and continuousvariables, since each can be represented separately using appropriate models (e.g.,Bernoulli distributions for binary observations or Gaussians for real-valued variables).

The conditional independence assumption of this model is clearly a strongone that may lead to rather poor representations of the class-conditional densities.Nevertheless, even if this assumption is not precisely satisﬁed, the model may stillgive good classiﬁcation performance in practice because the decision boundaries canbe insensitive to some of the details in the class-conditional densities, as illustratedin Figure 1.27.We have seen that a particular directed graph represents a speciﬁc decompositionof a joint probability distribution into a product of conditional probabilities.

Thegraph also expresses a set of conditional independence statements obtained throughthe d-separation criterion, and the d-separation theorem is really an expression of theequivalence of these two properties. In order to make this clear, it is helpful to thinkof a directed graph as a ﬁlter. Suppose we consider a particular joint probabilitydistribution p(x) over the variables x corresponding to the (nonobserved) nodes ofthe graph. The ﬁlter will allow this distribution to pass through if, and only if, it canbe expressed in terms of the factorization (8.5) implied by the graph.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.