Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 5

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 5 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 52020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 5)

The data for this example is generated from the function sin(2πx)with random noise included in the target values, as described in detail in Appendix A.Now suppose that we are given a training set comprising N observations of x,written x ≡ (x1 , . . . , xN )T , together with corresponding observations of the valuesof t, denoted t ≡ (t1 , . . . , tN )T .

Figure 1.2 shows a plot of a training set comprisingN = 10 data points. The input data set x in Figure 1.2 was generated by choosing values of xn , for n = 1, . . . , N , spaced uniformly in range [0, 1], and the targetdata set t was obtained by ﬁrst computing the corresponding values of the function1.1. Example: Polynomial Curve Fitting5sin(2πx) and then adding a small level of random noise having a Gaussian distribution (the Gaussian distribution is discussed in Section 1.2.4) to each such point inorder to obtain the corresponding value tn . By generating data in this way, we arecapturing a property of many real data sets, namely that they possess an underlyingregularity, which we wish to learn, but that individual observations are corrupted byrandom noise. This noise might arise from intrinsically stochastic (i.e.

random) processes such as radioactive decay but more typically is due to there being sources ofvariability that are themselves unobserved.Our goal is to exploit this training set in order to make predictions of the valuet of the target variable for some new value x of the input variable. As we shall seelater, this involves implicitly trying to discover the underlying function sin(2πx).This is intrinsically a difﬁcult problem as we have to generalize from a ﬁnite dataset. Furthermore the observed data are corrupted with noise, and so for a given xthere is uncertainty as to the appropriate value for t. Probability theory, discussedin Section 1.2, provides a framework for expressing such uncertainty in a preciseand quantitative manner, and decision theory, discussed in Section 1.5, allows us toexploit this probabilistic representation in order to make predictions that are optimalaccording to appropriate criteria.For the moment, however, we shall proceed rather informally and consider asimple approach based on curve ﬁtting.

In particular, we shall ﬁt the data using apolynomial function of the formMy(x, w) = w0 + w1 x + w2 x + . . . + wM x2=Mwj xj(1.1)j =0where M is the order of the polynomial, and xj denotes x raised to the power of j.The polynomial coefﬁcients w0 , . . . , wM are collectively denoted by the vector w.Note that, although the polynomial function y(x, w) is a nonlinear function of x, itis a linear function of the coefﬁcients w.

Functions, such as the polynomial, whichare linear in the unknown parameters have important properties and are called linearmodels and will be discussed extensively in Chapters 3 and 4.The values of the coefﬁcients will be determined by ﬁtting the polynomial to thetraining data. This can be done by minimizing an error function that measures themisﬁt between the function y(x, w), for any given value of w, and the training setdata points. One simple choice of error function, which is widely used, is given bythe sum of the squares of the errors between the predictions y(xn , w) for each datapoint xn and the corresponding target values tn , so that we minimize12{y(xn , w) − tn }2NE(w) =(1.2)n=1where the factor of 1/2 is included for later convenience.

We shall discuss the motivation for this choice of error function later in this chapter. For the moment wesimply note that it is a nonnegative quantity that would be zero if, and only if, the61. INTRODUCTIONFigure 1.3The error function (1.2) corresponds to (one half of) the sum of tthe squares of the displacements(shown by the vertical green bars)of each data point from the functiony(x, w).tny(xn , w)xnxfunction y(x, w) were to pass exactly through each training data point.

The geometrical interpretation of the sum-of-squares error function is illustrated in Figure 1.3.Exercise 1.1We can solve the curve ﬁtting problem by choosing the value of w for whichE(w) is as small as possible. Because the error function is a quadratic function ofthe coefﬁcients w, its derivatives with respect to the coefﬁcients will be linear in theelements of w, and so the minimization of the error function has a unique solution,denoted by w , which can be found in closed form. The resulting polynomial isgiven by the function y(x, w ).There remains the problem of choosing the order M of the polynomial, and aswe shall see this will turn out to be an example of an important concept called modelcomparison or model selection.

In Figure 1.4, we show four examples of the resultsof ﬁtting polynomials having orders M = 0, 1, 3, and 9 to the data set shown inFigure 1.2.We notice that the constant (M = 0) and ﬁrst order (M = 1) polynomialsgive rather poor ﬁts to the data and consequently rather poor representations of thefunction sin(2πx). The third order (M = 3) polynomial seems to give the best ﬁtto the function sin(2πx) of the examples shown in Figure 1.4. When we go to amuch higher order polynomial (M = 9), we obtain an excellent ﬁt to the trainingdata. In fact, the polynomial passes exactly through each data point and E(w ) = 0.However, the ﬁtted curve oscillates wildly and gives a very poor representation ofthe function sin(2πx).

This latter behaviour is known as over-ﬁtting.As we have noted earlier, the goal is to achieve good generalization by makingaccurate predictions for new data. We can obtain some quantitative insight into thedependence of the generalization performance on M by considering a separate testset comprising 100 data points generated using exactly the same procedure usedto generate the training set points but with new choices for the random noise valuesincluded in the target values. For each choice of M , we can then evaluate the residualvalue of E(w ) given by (1.2) for the training data, and we can also evaluate E(w )for the test data set.

It is sometimes more convenient to use the root-mean-square71.1. Example: Polynomial Curve FittingM =01M =11tt00−1−10x10M =31xM =91t1t00−1−10x10x1Figure 1.4 Plots of polynomials having various orders M , shown as red curves, ﬁtted to the data set shown inFigure 1.2.(RMS) error deﬁned byERMS =2E(w )/N(1.3)in which the division by N allows us to compare different sizes of data sets onan equal footing, and the square root ensures that ERMS is measured on the samescale (and in the same units) as the target variable t.

Graphs of the training andtest set RMS errors are shown, for various values of M , in Figure 1.5. The testset error is a measure of how well we are doing in predicting the values of t fornew data observations of x. We note from Figure 1.5 that small values of M giverelatively large values of the test set error, and this can be attributed to the fact thatthe corresponding polynomials are rather inﬂexible and are incapable of capturingthe oscillations in the function sin(2πx). Values of M in the range 3 M 8give small values for the test set error, and these also give reasonable representationsof the generating function sin(2πx), as can be seen, for the case of M = 3, fromFigure 1.4.1.

INTRODUCTIONFigure 1.5Graphs of the root-mean-squareerror, deﬁned by (1.3), evaluatedon the training set and on an independent test set for various valuesof M .1TrainingTestERMS80.5003M69For M = 9, the training set error goes to zero, as we might expect becausethis polynomial contains 10 degrees of freedom corresponding to the 10 coefﬁcientsw0 , . . . , w9 , and so can be tuned exactly to the 10 data points in the training set.However, the test set error has become very large and, as we saw in Figure 1.4, thecorresponding function y(x, w ) exhibits wild oscillations.This may seem paradoxical because a polynomial of given order contains alllower order polynomials as special cases.

The M = 9 polynomial is therefore capable of generating results at least as good as the M = 3 polynomial. Furthermore, wemight suppose that the best predictor of new data would be the function sin(2πx)from which the data was generated (and we shall see later that this is indeed thecase). We know that a power series expansion of the function sin(2πx) containsterms of all orders, so we might expect that results should improve monotonically aswe increase M .We can gain some insight into the problem by examining the values of the coefﬁcients w obtained from polynomials of various order, as shown in Table 1.1.We see that, as M increases, the magnitude of the coefﬁcients typically gets larger.In particular for the M = 9 polynomial, the coefﬁcients have become ﬁnely tunedto the data by developing large positive and negative values so that the correspondTable 1.1Table of the coefﬁcients w forpolynomials of various order.Observe how the typical magnitude of the coefﬁcients increases dramatically as the order of the polynomial increases.w0w1w2w3w4w5w6w7w8w9M =00.19M =10.82-1.27M =60.317.99-25.4317.37M =90.35232.37-5321.8348568.31-231639.30640042.26-1061800.521042400.18-557682.99125201.4391.1.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.