Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 30

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 30 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 302020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 30)

If we consider aninterval A σ B, and a scaled interval A/c σ B/c, then the prior shouldassign equal probability mass to these two intervals. Thus we have B B/c B 11p(σ) dσ =p(σ) dσ =pσdσ(2.238)ccAA/cAand because this must hold for choices of A and B, we have 11p(σ) = pσcc(2.239)and hence p(σ) ∝ 1/σ. Note that again this is an improper prior because the integralof the distribution over 0 σ ∞ is divergent. It is sometimes also convenientto think of the prior distribution for a scale parameter in terms of the density of thelog of the parameter.

Using the transformation rule (1.27) for densities we see thatp(ln σ) = const. Thus, for this prior there is the same probability mass in the range1 σ 10 as in the range 10 σ 100 and in 100 σ 1000.1202. PROBABILITY DISTRIBUTIONSAn example of a scale parameter would be the standard deviation σ of a Gaussiandistribution, after we have taken account of the location parameter µ, becauseN (x|µ, σ 2 ) ∝ σ −1 exp −(x/σ)2(2.240)Section 2.3where x = x − µ. As discussed earlier, it is often more convenient to work in termsof the precision λ = 1/σ 2 rather than σ itself. Using the transformation rule fordensities, we see that a distribution p(σ) ∝ 1/σ corresponds to a distribution over λof the form p(λ) ∝ 1/λ. We have seen that the conjugate prior for λ was the gammadistribution Gam(λ|a0 , b0 ) given by (2.146). The noninformative prior is obtainedas the special case a0 = b0 = 0.

Again, if we examine the results (2.150) and (2.151)for the posterior distribution of λ, we see that for a0 = b0 = 0, the posterior dependsonly on terms arising from the data and not from the prior.2.5. Nonparametric MethodsThroughout this chapter, we have focussed on the use of probability distributionshaving speciﬁc functional forms governed by a small number of parameters whosevalues are to be determined from a data set. This is called the parametric approachto density modelling.

An important limitation of this approach is that the chosendensity might be a poor model of the distribution that generates the data, which canresult in poor predictive performance. For instance, if the process that generates thedata is multimodal, then this aspect of the distribution can never be captured by aGaussian, which is necessarily unimodal.In this ﬁnal section, we consider some nonparametric approaches to density estimation that make few assumptions about the form of the distribution. Here we shallfocus mainly on simple frequentist methods. The reader should be aware, however,that nonparametric Bayesian methods are attracting increasing interest (Walker et al.,1999; Neal, 2000; Müller and Quintana, 2004; Teh et al., 2006).Let us start with a discussion of histogram methods for density estimation, whichwe have already encountered in the context of marginal and conditional distributionsin Figure 1.11 and in the context of the central limit theorem in Figure 2.6.

Here weexplore the properties of histogram density models in more detail, focussing on thecase of a single continuous variable x. Standard histograms simply partition x intodistinct bins of width ∆i and then count the number ni of observations of x fallingin bin i. In order to turn this count into a normalized probability density, we simplydivide by the total number N of observations and by the width ∆i of the bins toobtain probability values for each bin given bypi =niN ∆i(2.241)for which it is easily seen that p(x) dx = 1.

This gives a model for the densityp(x) that is constant over the width of each bin, and often the bins are chosen to havethe same width ∆i = ∆.2.5. Nonparametric MethodsFigure 2.24An illustration of the histogram approach 5to density estimation, in which a data set∆ = 0.04of 50 data points is generated from thedistribution shown by the green curve.Histogram density estimates, based on 0 0(2.241), with a common bin width ∆ are 5∆ = 0.08shown for various values of ∆.050Section 1.401210.510.510.51∆ = 0.250In Figure 2.24, we show an example of histogram density estimation.

Herethe data is drawn from the distribution, corresponding to the green curve, which isformed from a mixture of two Gaussians. Also shown are three examples of histogram density estimates corresponding to three different choices for the bin width∆. We see that when ∆ is very small (top ﬁgure), the resulting density model is veryspiky, with a lot of structure that is not present in the underlying distribution thatgenerated the data set. Conversely, if ∆ is too large (bottom ﬁgure) then the result isa model that is too smooth and that consequently fails to capture the bimodal property of the green curve.

The best results are obtained for some intermediate valueof ∆ (middle ﬁgure). In principle, a histogram density model is also dependent onthe choice of edge location for the bins, though this is typically much less signiﬁcantthan the value of ∆.Note that the histogram method has the property (unlike the methods to be discussed shortly) that, once the histogram has been computed, the data set itself canbe discarded, which can be advantageous if the data set is large. Also, the histogramapproach is easily applied if the data points are arriving sequentially.In practice, the histogram technique can be useful for obtaining a quick visualization of data in one or two dimensions but is unsuited to most density estimationapplications.

One obvious problem is that the estimated density has discontinuitiesthat are due to the bin edges rather than any property of the underlying distributionthat generated the data. Another major limitation of the histogram approach is itsscaling with dimensionality. If we divide each variable in a D-dimensional spaceinto M bins, then the total number of bins will be M D . This exponential scalingwith D is an example of the curse of dimensionality. In a space of high dimensionality, the quantity of data needed to provide meaningful estimates of local probabilitydensity would be prohibitive.The histogram approach to density estimation does, however, teach us two important lessons.

First, to estimate the probability density at a particular location,we should consider the data points that lie within some local neighbourhood of thatpoint. Note that the concept of locality requires that we assume some form of distance measure, and here we have been assuming Euclidean distance.

For histograms,1222. PROBABILITY DISTRIBUTIONSthis neighbourhood property was deﬁned by the bins, and there is a natural ‘smoothing’ parameter describing the spatial extent of the local region, in this case the binwidth. Second, the value of the smoothing parameter should be neither too large nortoo small in order to obtain good results. This is reminiscent of the choice of modelcomplexity in polynomial curve ﬁtting discussed in Chapter 1 where the degree Mof the polynomial, or alternatively the value α of the regularization parameter, wasoptimal for some intermediate value, neither too large nor too small.

Armed withthese insights, we turn now to a discussion of two widely used nonparametric techniques for density estimation, kernel estimators and nearest neighbours, which havebetter scaling with dimensionality than the simple histogram model.2.5.1 Kernel density estimatorsLet us suppose that observations are being drawn from some unknown probability density p(x) in some D-dimensional space, which we shall take to be Euclidean,and we wish to estimate the value of p(x). From our earlier discussion of locality,let us consider some small region R containing x. The probability mass associatedwith this region is given byp(x) dx.(2.242)P =RSection 2.1Now suppose that we have collected a data set comprising N observations drawnfrom p(x).

Because each data point has a probability P of falling within R, the totalnumber K of points that lie inside R will be distributed according to the binomialdistributionN!(2.243)P K (1 − P )1−K .Bin(K|N, P ) =K!(N − K)!Using (2.11), we see that the mean fraction of points falling inside the region isE[K/N ] = P , and similarly using (2.12) we see that the variance around this meanis var[K/N ] = P (1 − P )/N .

For large N , this distribution will be sharply peakedaround the mean and soK N P.(2.244)If, however, we also assume that the region R is sufﬁciently small that the probabilitydensity p(x) is roughly constant over the region, then we haveP p(x)V(2.245)where V is the volume of R. Combining (2.244) and (2.245), we obtain our densityestimate in the formK.(2.246)p(x) =NVNote that the validity of (2.246) depends on two contradictory assumptions, namelythat the region R be sufﬁciently small that the density is approximately constant overthe region and yet sufﬁciently large (in relation to the value of that density) that thenumber K of points falling inside the region is sufﬁcient for the binomial distributionto be sharply peaked.2.5. Nonparametric Methods123We can exploit the result (2.246) in two different ways. Either we can ﬁx K anddetermine the value of V from the data, which gives rise to the K-nearest-neighbourtechnique discussed shortly, or we can ﬁx V and determine K from the data, giving rise to the kernel approach.

It can be shown that both the K-nearest-neighbourdensity estimator and the kernel density estimator converge to the true probabilitydensity in the limit N → ∞ provided V shrinks suitably with N , and K grows withN (Duda and Hart, 1973).We begin by discussing the kernel method in detail, and to start with we takethe region R to be a small hypercube centred on the point x at which we wish todetermine the probability density. In order to count the number K of points fallingwithin this region, it is convenient to deﬁne the following function1, |ui | 1/2,i = 1, .

. . , D,k(u) =(2.247)0, otherwisewhich represents a unit cube centred on the origin. The function k(u) is an exampleof a kernel function, and in this context is also called a Parzen window. From (2.247),the quantity k((x − xn )/h) will be one if the data point xn lies inside a cube of sideh centred on x, and zero otherwise. The total number of data points lying inside thiscube will therefore beNx − x nk.(2.248)K=hn=1Substituting this expression into (2.246) then gives the following result for the estimated density at xN1 1 x − xn p(x) =k(2.249)NhDhn=1Dwhere we have used V = h for the volume of a hypercube of side h in D dimensions.

Using the symmetry of the function k(u), we can now re-interpret thisequation, not as a single cube centred on x but as the sum over N cubes centred onthe N data points xn .As it stands, the kernel density estimator (2.249) will suffer from one of the sameproblems that the histogram method suffered from, namely the presence of artiﬁcialdiscontinuities, in this case at the boundaries of the cubes.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.