Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 9

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 9 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 92020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 9)

During the 18th century, issues regarding probability arose in connection withgambling and with the new concept of insurance. Oneparticularly important problem concerned so-called inverse probability. A solution was proposed by ThomasBayes in his paper ‘Essay towards solving a problemin the doctrine of chances’, which was published in1764, some three years after his death, in the Philosophical Transactions of the Royal Society.

In fact,Bayes only formulated his theory for the case of a uniform prior, and it was Pierre-Simon Laplace who independently rediscovered the theory in general form andwho demonstrated its broad applicability.221. INTRODUCTIONtion of probability. Consider the example of polynomial curve ﬁtting discussed inSection 1.1. It seems reasonable to apply the frequentist notion of probability to therandom values of the observed variables tn . However, we would like to address andquantify the uncertainty that surrounds the appropriate choice for the model parameters w.

We shall see that, from a Bayesian perspective, we can use the machineryof probability theory to describe the uncertainty in model parameters such as w, orindeed in the choice of model itself.Bayes’ theorem now acquires a new signiﬁcance. Recall that in the boxes of fruitexample, the observation of the identity of the fruit provided relevant informationthat altered the probability that the chosen box was the red one. In that example,Bayes’ theorem was used to convert a prior probability into a posterior probabilityby incorporating the evidence provided by the observed data.

As we shall see indetail later, we can adopt a similar approach when making inferences about quantitiessuch as the parameters w in the polynomial curve ﬁtting example. We capture ourassumptions about w, before observing the data, in the form of a prior probabilitydistribution p(w). The effect of the observed data D = {t1 , . . . , tN } is expressedthrough the conditional probability p(D|w), and we shall see later, in Section 1.2.5,how this can be represented explicitly. Bayes’ theorem, which takes the formp(w|D) =p(D|w)p(w)p(D)(1.43)then allows us to evaluate the uncertainty in w after we have observed D in the formof the posterior probability p(w|D).The quantity p(D|w) on the right-hand side of Bayes’ theorem is evaluated forthe observed data set D and can be viewed as a function of the parameter vectorw, in which case it is called the likelihood function.

It expresses how probable theobserved data set is for different settings of the parameter vector w. Note that thelikelihood is not a probability distribution over w, and its integral with respect to wdoes not (necessarily) equal one.Given this deﬁnition of likelihood, we can state Bayes’ theorem in wordsposterior ∝ likelihood × prior(1.44)where all of these quantities are viewed as functions of w. The denominator in(1.43) is the normalization constant, which ensures that the posterior distributionon the left-hand side is a valid probability density and integrates to one. Indeed,integrating both sides of (1.43) with respect to w, we can express the denominatorin Bayes’ theorem in terms of the prior distribution and the likelihood function(1.45)p(D) = p(D|w)p(w) dw.In both the Bayesian and frequentist paradigms, the likelihood function p(D|w)plays a central role.

However, the manner in which it is used is fundamentally different in the two approaches. In a frequentist setting, w is considered to be a ﬁxedparameter, whose value is determined by some form of ‘estimator’, and error bars1.2. Probability TheorySection 2.1Section 2.4.3Section 1.323on this estimate are obtained by considering the distribution of possible data sets D.By contrast, from the Bayesian viewpoint there is only a single data set D (namelythe one that is actually observed), and the uncertainty in the parameters is expressedthrough a probability distribution over w.A widely used frequentist estimator is maximum likelihood, in which w is setto the value that maximizes the likelihood function p(D|w). This corresponds tochoosing the value of w for which the probability of the observed data set is maximized. In the machine learning literature, the negative log of the likelihood functionis called an error function.

Because the negative logarithm is a monotonically decreasing function, maximizing the likelihood is equivalent to minimizing the error.One approach to determining frequentist error bars is the bootstrap (Efron, 1979;Hastie et al., 2001), in which multiple data sets are created as follows.

Suppose ouroriginal data set consists of N data points X = {x1 , . . . , xN }. We can create a newdata set XB by drawing N points at random from X, with replacement, so that somepoints in X may be replicated in XB , whereas other points in X may be absent fromXB . This process can be repeated L times to generate L data sets each of size N andeach obtained by sampling from the original data set X. The statistical accuracy ofparameter estimates can then be evaluated by looking at the variability of predictionsbetween the different bootstrap data sets.One advantage of the Bayesian viewpoint is that the inclusion of prior knowledge arises naturally.

Suppose, for instance, that a fair-looking coin is tossed threetimes and lands heads each time. A classical maximum likelihood estimate of theprobability of landing heads would give 1, implying that all future tosses will landheads! By contrast, a Bayesian approach with any reasonable prior will lead to amuch less extreme conclusion.There has been much controversy and debate associated with the relative merits of the frequentist and Bayesian paradigms, which have not been helped by thefact that there is no unique frequentist, or even Bayesian, viewpoint.

For instance,one common criticism of the Bayesian approach is that the prior distribution is often selected on the basis of mathematical convenience rather than as a reﬂection ofany prior beliefs. Even the subjective nature of the conclusions through their dependence on the choice of prior is seen by some as a source of difﬁculty. Reducingthe dependence on the prior is one motivation for so-called noninformative priors.However, these lead to difﬁculties when comparing different models, and indeedBayesian methods based on poor choices of prior can give poor results with highconﬁdence.

Frequentist evaluation methods offer some protection from such problems, and techniques such as cross-validation remain useful in areas such as modelcomparison.This book places a strong emphasis on the Bayesian viewpoint, reﬂecting thehuge growth in the practical importance of Bayesian methods in the past few years,while also discussing useful frequentist concepts as required.Although the Bayesian framework has its origins in the 18th century, the practical application of Bayesian methods was for a long time severely limited by thedifﬁculties in carrying through the full Bayesian procedure, particularly the need tomarginalize (sum or integrate) over the whole of parameter space, which, as we shall241. INTRODUCTIONsee, is required in order to make predictions or to compare different models.

Thedevelopment of sampling methods, such as Markov chain Monte Carlo (discussed inChapter 11) along with dramatic improvements in the speed and memory capacityof computers, opened the door to the practical use of Bayesian techniques in an impressive range of problem domains. Monte Carlo methods are very ﬂexible and canbe applied to a wide range of models. However, they are computationally intensiveand have mainly been used for small-scale problems.More recently, highly efﬁcient deterministic approximation schemes such asvariational Bayes and expectation propagation (discussed in Chapter 10) have beendeveloped. These offer a complementary alternative to sampling methods and haveallowed Bayesian techniques to be used in large-scale applications (Blei et al., 2003).1.2.4 The Gaussian distributionWe shall devote the whole of Chapter 2 to a study of various probability distributions and their key properties. It is convenient, however, to introduce here oneof the most important probability distributions for continuous variables, called thenormal or Gaussian distribution.

We shall make extensive use of this distribution inthe remainder of this chapter and indeed throughout much of the book.For the case of a single real-valued variable x, the Gaussian distribution is deﬁned by1122exp − 2 (x − µ)(1.46)N x|µ, σ =2σ(2πσ 2 )1/2which is governed by two parameters: µ, called the mean, and σ 2 , called the variance. The square root of the variance, given by σ, is called the standard deviation,and the reciprocal of the variance, written as β = 1/σ 2 , is called the precision.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.