Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 6

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 6 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 62020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 6)

Example: Polynomial Curve FittingN = 151N = 1001tt00−1−10x10x1Figure 1.6 Plots of the solutions obtained by minimizing the sum-of-squares error function using the M = 9polynomial for N = 15 data points (left plot) and N = 100 data points (right plot). We see that increasing thesize of the data set reduces the over-ﬁtting problem.Section 3.4ing polynomial function matches each of the data points exactly, but between datapoints (particularly near the ends of the range) the function exhibits the large oscillations observed in Figure 1.4. Intuitively, what is happening is that the more ﬂexiblepolynomials with larger values of M are becoming increasingly tuned to the randomnoise on the target values.It is also interesting to examine the behaviour of a given model as the size of thedata set is varied, as shown in Figure 1.6.

We see that, for a given model complexity,the over-ﬁtting problem become less severe as the size of the data set increases.Another way to say this is that the larger the data set, the more complex (in otherwords more ﬂexible) the model that we can afford to ﬁt to the data. One roughheuristic that is sometimes advocated is that the number of data points should beno less than some multiple (say 5 or 10) of the number of adaptive parameters inthe model.

However, as we shall see in Chapter 3, the number of parameters is notnecessarily the most appropriate measure of model complexity.Also, there is something rather unsatisfying about having to limit the number ofparameters in a model according to the size of the available training set. It wouldseem more reasonable to choose the complexity of the model according to the complexity of the problem being solved. We shall see that the least squares approachto ﬁnding the model parameters represents a speciﬁc case of maximum likelihood(discussed in Section 1.2.5), and that the over-ﬁtting problem can be understood asa general property of maximum likelihood.

By adopting a Bayesian approach, theover-ﬁtting problem can be avoided. We shall see that there is no difﬁculty froma Bayesian perspective in employing models for which the number of parametersgreatly exceeds the number of data points. Indeed, in a Bayesian model the effectivenumber of parameters adapts automatically to the size of the data set.For the moment, however, it is instructive to continue with the current approachand to consider how in practice we can apply it to data sets of limited size where we101. INTRODUCTIONln λ = −181ln λ = 01tt00−1−10x10x1Figure 1.7 Plots of M = 9 polynomials ﬁtted to the data set shown in Figure 1.2 using the regularized errorfunction (1.4) for two values of the regularization parameter λ corresponding to ln λ = −18 and ln λ = 0.

Thecase of no regularizer, i.e., λ = 0, corresponding to ln λ = −∞, is shown at the bottom right of Figure 1.4.may wish to use relatively complex and ﬂexible models. One technique that is oftenused to control the over-ﬁtting phenomenon in such cases is that of regularization,which involves adding a penalty term to the error function (1.2) in order to discouragethe coefﬁcients from reaching large values. The simplest such penalty term takes theform of a sum of squares of all of the coefﬁcients, leading to a modiﬁed error functionof the formN1λ2E(w)={y(xn , w) − tn } + w2(1.4)22n=12where w ≡ w w =+ w12 + .

. . + wM, and the coefﬁcient λ governs the relative importance of the regularization term compared with the sum-of-squares errorterm. Note that often the coefﬁcient w0 is omitted from the regularizer because itsinclusion causes the results to depend on the choice of origin for the target variable(Hastie et al., 2001), or it may be included but with its own regularization coefﬁcient(we shall discuss this topic in more detail in Section 5.5.1).

Again, the error functionin (1.4) can be minimized exactly in closed form. Techniques such as this are knownin the statistics literature as shrinkage methods because they reduce the value of thecoefﬁcients. The particular case of a quadratic regularizer is called ridge regression (Hoerl and Kennard, 1970). In the context of neural networks, this approach isknown as weight decay.Figure 1.7 shows the results of ﬁtting the polynomial of order M = 9 to thesame data set as before but now using the regularized error function given by (1.4).We see that, for a value of ln λ = −18, the over-ﬁtting has been suppressed and wenow obtain a much closer representation of the underlying function sin(2πx).

If,however, we use too large a value for λ then we again obtain a poor ﬁt, as shown inFigure 1.7 for ln λ = 0. The corresponding coefﬁcients from the ﬁtted polynomialsare given in Table 1.2, showing that regularization has the desired effect of reducing2Exercise 1.2Tw02111.1. Example: Polynomial Curve FittingSection 1.3Figure 1.8Table of the coefﬁcients w for M =9 polynomials with various values forthe regularization parameter λ.

Notethat ln λ = −∞ corresponds to amodel with no regularization, i.e., tothe graph at the bottom right in Figure 1.4. We see that, as the value ofλ increases, the typical magnitude ofthe coefﬁcients gets smaller.ln λ = −∞0.35232.37-5321.8348568.31-231639.30640042.26-1061800.521042400.18-557682.99125201.43w0w1w2w3w4w5w6w7w8w9ln λ = −180.354.74-0.77-31.97-3.8955.2841.32-45.95-91.5372.68ln λ = 00.13-0.05-0.06-0.05-0.03-0.02-0.01-0.000.000.01the magnitude of the coefﬁcients.The impact of the regularization term on the generalization error can be seen byplotting the value of the RMS error (1.3) for both training and test sets against ln λ,as shown in Figure 1.8.

We see that in effect λ now controls the effective complexityof the model and hence determines the degree of over-ﬁtting.The issue of model complexity is an important one and will be discussed atlength in Section 1.3. Here we simply note that, if we were trying to solve a practicalapplication using this approach of minimizing an error function, we would have toﬁnd a way to determine a suitable value for the model complexity. The results abovesuggest a simple way of achieving this, namely by taking the available data andpartitioning it into a training set, used to determine the coefﬁcients w, and a separatevalidation set, also called a hold-out set, used to optimize the model complexity(either M or λ).

In many cases, however, this will prove to be too wasteful ofvaluable training data, and we have to seek more sophisticated approaches.So far our discussion of polynomial curve ﬁtting has appealed largely to intuition. We now seek a more principled approach to solving problems in patternrecognition by turning to a discussion of probability theory. As well as providing thefoundation for nearly all of the subsequent developments in this book, it will alsoGraph of the root-mean-square error (1.3) versus ln λ for the M = 9polynomial.1TrainingTestERMSTable 1.20.50−35−30ln λ −25−20121. INTRODUCTIONgive us some important insights into the concepts we have introduced in the context of polynomial curve ﬁtting and will allow us to extend these to more complexsituations.1.2.

Probability TheoryA key concept in the ﬁeld of pattern recognition is that of uncertainty. It arises boththrough noise on measurements, as well as through the ﬁnite size of data sets. Probability theory provides a consistent framework for the quantiﬁcation and manipulation of uncertainty and forms one of the central foundations for pattern recognition.When combined with decision theory, discussed in Section 1.5, it allows us to makeoptimal predictions given all the information available to us, even though that information may be incomplete or ambiguous.We will introduce the basic concepts of probability theory by considering a simple example.

Imagine we have two boxes, one red and one blue, and in the red boxwe have 2 apples and 6 oranges, and in the blue box we have 3 apples and 1 orange.This is illustrated in Figure 1.9. Now suppose we randomly pick one of the boxesand from that box we randomly select an item of fruit, and having observed whichsort of fruit it is we replace it in the box from which it came. We could imaginerepeating this process many times.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.