An introduction to information retrieval. Manning_ Raghavan (2009) (811397), страница 77

Файл №811397 An introduction to information retrieval. Manning_ Raghavan (2009) (An introduction to information retrieval. Manning_ Raghavan (2009).pdf) 77 страницаAn introduction to information retrieval. Manning_ Raghavan (2009) (811397) страница 772020-08-252020-08-25СтудИзба

An introduction to information retrieval. Manning_ Raghavan (2009).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 77)

Insisting on a large margin reduces the capacity of the model: the range of angles at which the fat decision surface can be placed is smaller than for a decision hyperplane (cf. Figure 14.8,page 301).Maximizing the margin seems good because points near the decision surface represent very uncertain classification decisions: there is almost a 50%chance of the classifier deciding either way. A classifier with a large marginmakes no low certainty classification decisions.

This gives you a classification safety margin: a slight error in measurement or a slight document variation will not cause a misclassification. Another intuition motivating SVMsis shown in Figure 15.2. By construction, an SVM classifier insists on a largemargin around the decision boundary. Compared to a decision hyperplane,if you have to place a fat separator between classes, you have fewer choicesof where it can be put. As a result of this, the memory capacity of the modelhas been decreased, and hence we expect that its ability to correctly generalize to test data is increased (cf. the discussion of the bias-variance tradeoff inChapter 14, page 312).Let us formalize an SVM with algebra. A decision hyperplane (page 302)can be defined by an intercept term b and a decision hyperplane normal vector w~ which is perpendicular to the hyperplane.

This vector is commonlyOnline edition (c) 2009 Cambridge UP32215 Support vector machines and machine learning on documentsWEIGHT VECTOR(15.1)FUNCTIONAL MARGIN(15.2)referred to in the machine learning literature as the weight vector. To chooseamong all the hyperplanes that are perpendicular to the normal vector, wespecify the intercept term b. Because the hyperplane is perpendicular to thenormal vector, all points ~x on the hyperplane satisfy w~ T~x = −b. Now suppose that we have a set of training data points D = {(~xi , yi )}, where eachmember is a pair of a point ~x i and a class label y i corresponding to it.1 ForSVMs, the two data classes are always named +1 and −1 (rather than 1 and0), and the intercept term is always explicitly represented as b (rather thanbeing folded into the weight vector w~ by adding an extra always-on feature).The math works out much more cleanly if you do things this way, as we willsee almost immediately in the definition of functional margin.

The linearclassifier is then:f (~x ) = sign(~wT~x + b)A value of −1 indicates one class, and a value of +1 the other class.We are confident in the classification of a point if it is far away from thedecision boundary. For a given data set and decision hyperplane, we definethe functional margin of the ith example ~x i with respect to a hyperplane h~w, b ias the quantity y i (~wT~xi + b). The functional margin of a data set with respect to a decision surface is then twice the functional margin of any of thepoints in the data set with minimal functional margin (the factor of 2 comesfrom measuring across the whole width of the margin, as in Figure 15.3).However, there is a problem with using this definition as is: the value is underconstrained, because we can always make the functional margin as bigas we wish by simply scaling up w~ and b. For example, if we replace w~ by5w~ and b by 5b then the functional margin yi (5w~ T~xi + 5b) is five times aslarge.

This suggests that we need to place some constraint on the size of thew~ vector. To get a sense of how to do that, let us look at the actual geometry.What is the Euclidean distance from a point ~x to the decision boundary? InFigure 15.3, we denote by r this distance. We know that the shortest distancebetween a point and a hyperplane is perpendicular to the plane, and hence,parallel to w~ . A unit vector in this direction is w~ /|~w|. The dotted line in thediagram is then a translation of the vector r w~ /|~w|. Let us label the point onthe hyperplane closest to ~x as ~x ′ . Then:~x ′ = ~x − yrw~|~w|where multiplying by y just changes the sign for the two cases of ~x being oneither side of the decision surface. Moreover, ~x ′ lies on the decision boundary1.

As discussed in Section 14.1 (page 291), we present the general case of points in a vectorspace, but if the points are length normalized document vectors, then all the action is takingplace on the surface of a unit sphere, and the decision surface intersects the sphere’s surface.Online edition (c) 2009 Cambridge UP32315.1 Support vector machines: The linearly separable case7u~xttu65b4~x+′tub2w~tub3bturtutubb bbρb10012345678◮ Figure 15.3 The geometric margin of a point (r) and a decision boundary (ρ).and so satisfies w~ T~x ′ + b = 0.

Hence:w~ T ~x − yr(15.3)Solving for r gives:2(15.4)GEOMETRIC MARGINr=yw~ +b = 0|~w|w~ T~x + b|~w|Again, the points closest to the separating hyperplane are support vectors.The geometric margin of the classifier is the maximum width of the band thatcan be drawn separating the support vectors of the two classes. That is, it istwice the minimum value over data points for r given in Equation (15.4), or,equivalently, the maximal width of one of the fat separators shown in Figure 15.2.

The geometric margin is clearly invariant to scaling of parameters:if we replace w~ by 5w~ and b by 5b, then the geometric margin is the same, because it is inherently normalized by the length of w~ . This means that we canimpose any scaling constraint we wish on w~ without affecting the geometricmargin. Among other choices, we could use unit vectors, as in Chapter 6, by2. Recall that |~w| =√w~ Tw~.Online edition (c) 2009 Cambridge UP32415 Support vector machines and machine learning on documentsrequiring that |~w| = 1. This would have the effect of making the geometricmargin the same as the functional margin.Since we can scale the functional margin as we please, for convenience insolving large SVMs, let us choose to require that the functional margin of alldata points is at least 1 and that it is equal to 1 for at least one data vector.That is, for all items in the data:yi (~wT~xi + b) ≥ 1(15.5)and there exist support vectors for which the inequality is an equality.

Sinceeach example’s distance from the hyperplane is ri = yi (~wT~x i + b)/|~w |, thegeometric margin is ρ = 2/|~w|. Our desire is still to maximize this geometricmargin. That is, we want to find w~ and b such that:• ρ = 2/|~w| is maximized• For all (~x i , yi ) ∈ D, y i (~wT~xi + b) ≥ 1Maximizing 2/|~w| is the same as minimizing |~w|/2. This gives the final standard formulation of an SVM as a minimization problem:(15.6)Find w~ and b such that:•1 T~ w~2wis minimized, and• for all {(~x i , yi )}, yi (~wT~xi + b) ≥ 1Q UADRATICP ROGRAMMING(15.7)We are now optimizing a quadratic function subject to linear constraints.Quadratic optimization problems are a standard, well-known class of mathematical optimization problems, and many algorithms exist for solving them.We could in principle build our SVM using standard quadratic programming(QP) libraries, but there has been much recent research in this area aiming toexploit the structure of the kind of QP that emerges from an SVM.

As a result,there are more intricate but much faster and more scalable libraries availableespecially for building SVMs, which almost everyone uses to build models.We will not present the details of such algorithms here.However, it will be helpful to what follows to understand the shape of thesolution of such an optimization problem. The solution involves constructing a dual problem where a Lagrange multiplier αi is associated with eachconstraint yi (~wT ~x i + b) ≥ 1 in the primal problem:Find α1 , . . . α N such that ∑ αi −12∑i ∑ j αi α j yi y j~xi T~x j is maximized, and• ∑i αi yi = 0• αi ≥ 0 for all 1 ≤ i ≤ NThe solution is then of the form:Online edition (c) 2009 Cambridge UP15.1 Support vector machines: The linearly separable case325tu32b1b00123◮ Figure 15.4 A tiny 3 data point training set for an SVM.(15.8)w~ = ∑ αi yi~xib = yk − w~ T~xk for any ~xk such that αk 6= 0In the solution, most of the αi are zero.

Характеристики

Тип файла

PDF-файл

Размер

6,58 Mb

Материал

An introduction to information retrieval. Manning_ Raghavan (2009).pdf

Тип материала

Книга

Предмет

Анализ текстовых данных и информационный поиск

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

an-introduction-to-information-retrieval.-manning_-raghavan-2009.pdf.rar

An introduction to information retrieval. Manning_ Raghavan (2009).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.