Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 12

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 12 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 122020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 12)

If data is plentiful, then one approach issimply to use some of the available data to train a range of models, or a given modelwith a range of values for its complexity parameters, and then to compare them onindependent data, sometimes called a validation set, and select the one having thebest predictive performance.

If the model design is iterated many times using a limited size data set, then some over-ﬁtting to the validation data can occur and so it maybe necessary to keep aside a third test set on which the performance of the selectedmodel is ﬁnally evaluated.In many applications, however, the supply of data for training and testing will belimited, and in order to build good models, we wish to use as much of the availabledata as possible for training. However, if the validation set is small, it will give arelatively noisy estimate of predictive performance. One solution to this dilemma isto use cross-validation, which is illustrated in Figure 1.18.

This allows a proportion(S − 1)/S of the available data to be used for training while making use of all of the1.4. The Curse of DimensionalityFigure 1.18The technique of S-fold cross-validation, illustrated here for the case of S = 4, involves taking the available data and partitioning it into Sgroups (in the simplest case these are of equalsize). Then S − 1 of the groups are used to traina set of models that are then evaluated on the remaining group. This procedure is then repeatedfor all S possible choices for the held-out group,indicated here by the red blocks, and the performance scores from the S runs are then averaged.33run 1run 2run 3run 4data to assess performance.

When data is particularly scarce, it may be appropriateto consider the case S = N , where N is the total number of data points, which givesthe leave-one-out technique.One major drawback of cross-validation is that the number of training runs thatmust be performed is increased by a factor of S, and this can prove problematic formodels in which the training is itself computationally expensive. A further problemwith techniques such as cross-validation that use separate data to assess performanceis that we might have multiple complexity parameters for a single model (for instance, there might be several regularization parameters). Exploring combinationsof settings for such parameters could, in the worst case, require a number of trainingruns that is exponential in the number of parameters. Clearly, we need a better approach.

Ideally, this should rely only on the training data and should allow multiplehyperparameters and model types to be compared in a single training run. We therefore need to ﬁnd a measure of performance which depends only on the training dataand which does not suffer from bias due to over-ﬁtting.Historically various ‘information criteria’ have been proposed that attempt tocorrect for the bias of maximum likelihood by the addition of a penalty term tocompensate for the over-ﬁtting of more complex models. For example, the Akaikeinformation criterion, or AIC (Akaike, 1974), chooses the model for which the quantity(1.73)ln p(D|wML ) − Mis largest.

Here p(D|wML ) is the best-ﬁt log likelihood, and M is the number ofadjustable parameters in the model. A variant of this quantity, called the Bayesianinformation criterion, or BIC, will be discussed in Section 4.4.1. Such criteria donot take account of the uncertainty in the model parameters, however, and in practicethey tend to favour overly simple models. We therefore turn in Section 3.4 to a fullyBayesian approach where we shall see how complexity penalties arise in a naturaland principled way.1.4. The Curse of DimensionalityIn the polynomial curve ﬁtting example we had just one input variable x. For practical applications of pattern recognition, however, we will have to deal with spaces341. INTRODUCTIONFigure 1.19Scatter plot of the oil ﬂow datafor input variables x6 and x7 , inwhich red denotes the ‘homogenous’ class, green denotes the‘annular’ class, and blue denotesthe ‘laminar’ class.

Our goal isto classify the new test point denoted by ‘×’.21.5x710.5000.250.5x60.751of high dimensionality comprising many input variables. As we now discuss, thisposes some serious challenges and is an important factor inﬂuencing the design ofpattern recognition techniques.In order to illustrate the problem we consider a synthetically generated data setrepresenting measurements taken from a pipeline containing a mixture of oil, water, and gas (Bishop and James, 1993). These three materials can be present in oneof three different geometrical conﬁgurations known as ‘homogenous’, ‘annular’, and‘laminar’, and the fractions of the three materials can also vary. Each data point comprises a 12-dimensional input vector consisting of measurements taken with gammaray densitometers that measure the attenuation of gamma rays passing along narrow beams through the pipe. This data set is described in detail in Appendix A.Figure 1.19 shows 100 points from this data set on a plot showing two of the measurements x6 and x7 (the remaining ten input values are ignored for the purposes ofthis illustration).

Each data point is labelled according to which of the three geometrical classes it belongs to, and our goal is to use this data as a training set in order tobe able to classify a new observation (x6 , x7 ), such as the one denoted by the crossin Figure 1.19. We observe that the cross is surrounded by numerous red points, andso we might suppose that it belongs to the red class.

However, there are also plentyof green points nearby, so we might think that it could instead belong to the greenclass. It seems unlikely that it belongs to the blue class. The intuition here is that theidentity of the cross should be determined more strongly by nearby points from thetraining set and less strongly by more distant points. In fact, this intuition turns outto be reasonable and will be discussed more fully in later chapters.How can we turn this intuition into a learning algorithm? One very simple approach would be to divide the input space into regular cells, as indicated in Figure 1.20. When we are given a test point and we wish to predict its class, we ﬁrstdecide which cell it belongs to, and we then ﬁnd all of the training data points that1.4.

The Curse of DimensionalityFigure 1.20Illustration of a simple approachto the solution of a classiﬁcationproblem in which the input spaceis divided into cells and any newtest point is assigned to the classthat has a majority number of representatives in the same cell asthe test point. As we shall seeshortly, this simplistic approachhas some severe shortcomings.3521.5x710.50Section 1.100.250.5x60.751fall in the same cell.

The identity of the test point is predicted as being the sameas the class having the largest number of training points in the same cell as the testpoint (with ties being broken at random).There are numerous problems with this naive approach, but one of the most severe becomes apparent when we consider its extension to problems having largernumbers of input variables, corresponding to input spaces of higher dimensionality.The origin of the problem is illustrated in Figure 1.21, which shows that, if we dividea region of a space into regular cells, then the number of such cells grows exponentially with the dimensionality of the space. The problem with an exponentially largenumber of cells is that we would need an exponentially large quantity of training datain order to ensure that the cells are not empty.

Clearly, we have no hope of applyingsuch a technique in a space of more than a few variables, and so we need to ﬁnd amore sophisticated approach.We can gain further insight into the problems of high-dimensional spaces byreturning to the example of polynomial curve ﬁtting and considering how we wouldx2Figure 1.21 Illustration of thecurse of dimensionality, showinghow the number of regions of aregular grid grows exponentiallywith the dimensionality D of thespace. For clarity, only a subset ofthe cubical regions are shown forD = 3.x2x1x1D=1x1D=2x3D=3361. INTRODUCTIONextend this approach to deal with input spaces having several variables.

If we haveD input variables, then a general polynomial with coefﬁcients up to order 3 wouldtake the formy(x, w) = w0 +Di=1Exercise 1.16wi xi +D Dwij xi xj +i=1 j =1D D Di=1 j =1 k=1As D increases, so the number of independent coefﬁcients (not all of the coefﬁcientsare independent due to interchange symmetries amongst the x variables) grows proportionally to D3 . In practice, to capture complex dependencies in the data, we mayneed to use a higher-order polynomial. For a polynomial of order M , the growth inthe number of coefﬁcients is like DM . Although this is now a power law growth,rather than an exponential growth, it still points to the method becoming rapidlyunwieldy and of limited practical utility.Our geometrical intuitions, formed through a life spent in a space of three dimensions, can fail badly when we consider spaces of higher dimensionality.

As asimple example, consider a sphere of radius r = 1 in a space of D dimensions, andask what is the fraction of the volume of the sphere that lies between radius r = 1−and r = 1. We can evaluate this fraction by noting that the volume of a sphere ofradius r in D dimensions must scale as rD , and so we writeVD (r) = KD rDExercise 1.18(1.75)where the constant KD depends only on D. Thus the required fraction is given byVD (1) − VD (1 − )= 1 − (1 − )DVD (1)Exercise 1.20wijk xi xj xk . (1.74)(1.76)which is plotted as a function of for various values of D in Figure 1.22.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.