An introduction to information retrieval. Manning_ Raghavan (2009) (811397), страница 78

Файл №811397 An introduction to information retrieval. Manning_ Raghavan (2009) (An introduction to information retrieval. Manning_ Raghavan (2009).pdf) 78 страницаAn introduction to information retrieval. Manning_ Raghavan (2009) (811397) страница 782020-08-252020-08-25СтудИзба

An introduction to information retrieval. Manning_ Raghavan (2009).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 78)

Each non-zero αi indicates that thecorresponding ~xi is a support vector. The classification function is then:f (~x ) = sign(∑i αi yi~xi T~x + b)(15.9)Both the term to be maximized in the dual problem and the classifying function involve a dot product between pairs of points (~x and ~x i or ~xi and ~x j ), andthat is the only way the data are used – we will return to the significance ofthis later.To recap, we start with a training data set. The data set uniquely definesthe best separating hyperplane, and we feed the data through a quadraticoptimization procedure to find this plane. Given a new point ~x to classify,the classification function f (~x ) in either Equation (15.1) or Equation (15.9) iscomputing the projection of the point onto the hyperplane normal.

The signof this function determines the class to assign to the point. If the point iswithin the margin of the classifier (or another confidence threshold t that wemight have determined to minimize classification mistakes) then the classifier can return “don’t know” rather than one of the two classes. The valueof f (~x ) may also be transformed into a probability of classification; fittinga sigmoid to transform the values is standard (Platt 2000).

Also, since themargin is constant, if the model includes dimensions from various sources,careful rescaling of some dimensions may be required. However, this is nota problem if our documents (points) are on the unit hypersphere.✎Consider building an SVM over the (very little) data set shown inFigure 15.4. Working geometrically, for an example like this, the maximum marginweight vector will be parallel to the shortest line connecting points of the two classes,that is, the line between (1, 1) and (2, 3), giving a weight vector of (1, 2). The optimal decision surface is orthogonal to that line and intersects it at the halfway point.Example 15.1:Online edition (c) 2009 Cambridge UP32615 Support vector machines and machine learning on documentsTherefore, it passes through (1.5, 2).

So, the SVM decision boundary is:y = x1 + 2x2 − 5.5Working algebraically, with the standard constraint that sign(yi (~wT~xi + b )) ≥ 1,we seek to minimize |~w|. This happens when this constraint is satisfied with equalityby the two support vectors. Further we know that the solution is w~ = ( a, 2a) for somea.

So we have that:a + 2a + b2a + 6a + b==−11Therefore, a = 2/5 and b = −11/5. So the optimal hyperplane is given by w~ =(2/5, 4/5) and b = −11/5. √√√The margin ρ is 2/|~w| = 2/ 4/25 + 16/25 = 2/(2 5/5) = 5. This answer canbe confirmed geometrically by examining Figure 15.4.?Exercise 15.1[⋆]What is the minimum number of support vectors that there can be for a data set(which contains instances of each class)?Exercise 15.2[⋆⋆]The basis of being able to use kernels in SVMs (see Section 15.2.3) is that the classification function can be written in the form of Equation (15.9) (where, for large problems,most αi are 0). Show explicitly how the classification function could be written in thisform for the data set from Example 15.1. That is, write f as a function where the datapoints appear and the only variable is ~x.Exercise 15.3[⋆⋆]Install an SVM package such as SVMlight (http://svmlight.joachims.org/), and build anSVM for the data set discussed in Example 15.1.

Confirm that the program gives thesame solution as the text. For SVMlight, or another package that accepts the sametraining data format, the training file would be:+1 1:2 2:3−1 1:2 2:0−1 1:1 2:1The training command for SVMlight is then:svm_learn -c 1 -a alphas.dat train.dat model.datThe -c 1 option is needed to turn off use of the slack variables that we discuss inSection 15.2.1. Check that the norm of the weight vector agrees with what we foundin Example 15.1. Examine the file alphas.dat which contains the αi values, and checkthat they agree with your answers in Exercise 15.2.Online edition (c) 2009 Cambridge UP32715.2 Extensions to the SVM modeltubbbbb btutuξibtututt uu~xi bbtutubbtuξj~x j◮ Figure 15.5 Large margin classification with slack variables.15.215.2.1SLACK VARIABLES(15.10)Extensions to the SVM modelSoft margin classificationFor the very high dimensional problems common in text classification, sometimes the data are linearly separable.

But in the general case they are not, andeven if they are, we might prefer a solution that better separates the bulk ofthe data while ignoring a few weird noise documents.If the training set D is not linearly separable, the standard approach is toallow the fat decision margin to make a few mistakes (some points – outliersor noisy examples – are inside or on the wrong side of the margin). We thenpay a cost for each misclassified example, which depends on how far it isfrom meeting the margin requirement given in Equation (15.5).

To implement this, we introduce slack variables ξ i . A non-zero value for ξ i allows ~x i tonot meet the margin requirement at a cost proportional to the value of ξ i . SeeFigure 15.5.The formulation of the SVM optimization problem with slack variables is:Find w~ , b, and ξ i ≥ 0 such that:•1 T~ w~ + C ∑i ξ i2wis minimized• and for all {(~xi , yi )}, y i (~wT~x i + b) ≥ 1 − ξ iOnline edition (c) 2009 Cambridge UP32815 Support vector machines and machine learning on documentsREGULARIZATION(15.11)The optimization problem is then trading off how fat it can make the marginversus how many points have to be moved around to allow this margin.The margin can be less than 1 for a point ~x i by setting ξ i > 0, but then onepays a penalty of Cξ i in the minimization for having done that.

The sum ofthe ξ i gives an upper bound on the number of training errors. Soft-marginSVMs minimize training error traded off against margin. The parameter Cis a regularization term, which provides a way to control overfitting: as Cbecomes large, it is unattractive to not respect the data at the cost of reducingthe geometric margin; when it is small, it is easy to account for some datapoints with the use of slack variables and to have a fat margin placed so itmodels the bulk of the data.The dual problem for soft margin classification becomes:Find α1 , .

. . α N such that ∑ αi −12∑i ∑ j αi α j yi y j~xi T~x j is maximized, and• ∑i αi yi = 0• 0 ≤ αi ≤ C for all 1 ≤ i ≤ NNeither the slack variables ξ i nor Lagrange multipliers for them appear in thedual problem. All we are left with is the constant C bounding the possiblesize of the Lagrange multipliers for the support vector data points. As before,the ~xi with non-zero αi will be the support vectors. The solution of the dualproblem is of the form:(15.12)w~ = ∑ αyi~xib = y k (1 − ξ k ) − w~ T~xk for k = arg maxk αkAgain w~ is not needed explicitly for classification, which can be done in termsof dot products with data points, as in Equation (15.9).Typically, the support vectors will be a small proportion of the trainingdata.

However, if the problem is non-separable or with small margin, thenevery data point which is misclassified or within the margin will have a nonzero αi . If this set of points becomes large, then, for the nonlinear case whichwe turn to in Section 15.2.3, this can be a major slowdown for using SVMs attest time.The complexity of training and testing with linear SVMs is shown in Table 15.1.3 The time for training an SVM is dominated by the time for solvingthe underlying QP, and so the theoretical and empirical complexity varies depending on the method used to solve it.

The standard result for solving QPsis that it takes time cubic in the size of the data set (Kozlov et al. 1979). All therecent work on SVM training has worked to reduce that complexity, often by3. We write Θ(|D | Lave ) for Θ( T ) (page 262) and assume that the length of test documents isbounded as we did on page 262.Online edition (c) 2009 Cambridge UP32915.2 Extensions to the SVM modelClassifierNBNBModetrainingtestingMethodTime complexityΘ(|D | Lave + |C ||V |)Θ(|C | Ma )RocchioRocchiotrainingtestingkNNkNNtrainingtestingpreprocessingpreprocessingΘ(|D | Lave )Θ(|D | Mave Ma )kNNkNNtrainingtestingno preprocessingno preprocessingΘ(1)Θ(|D | Lave Ma )SVMtrainingconventionalSVMSVMtrainingtestingcutting planesO(|C ||D |3 Mave );≈ O(|C ||D |1.7 Mave ), empiricallyO(|C ||D | Mave )O(|C | Ma )Θ(|D | Lave + |C ||V |)Θ(|C | Ma )◮ Table 15.1 Training and testing complexity of various classifiers including SVMs.Training is the time the learning method takes to learn a classifier over D, while testing is the time it takes a classifier to classify one document.

For SVMs, multiclassclassification is assumed to be done by a set of |C | one-versus-rest classifiers. Lave isthe average number of tokens per document, while Mave is the average vocabulary(number of non-zero features) of a document. La and Ma are the numbers of tokensand types, respectively, in the test document.being satisfied with approximate solutions.

Standardly, empirical complexity is about O(|D |1.7 ) (Joachims 2006a). Nevertheless, the super-linear training time of traditional SVM algorithms makes them difficult or impossibleto use on very large training data sets. Alternative traditional SVM solution algorithms which are linear in the number of training examples scalebadly with a large number of features, which is another standard attributeof text problems. However, a new training algorithm based on cutting planetechniques gives a promising answer to this issue by having running timelinear in the number of training examples and the number of non-zero features in examples (Joachims 2006a).

Nevertheless, the actual speed of doingquadratic optimization remains much slower than simply counting terms asis done in a Naive Bayes model. Extending SVM algorithms to nonlinearSVMs, as in the next section, standardly increases training complexity by afactor of |D | (since dot products between examples need to be calculated),making them impractical. In practice it can often be cheaper to materializeOnline edition (c) 2009 Cambridge UP33015 Support vector machines and machine learning on documentsthe higher-order features and to train a linear SVM.415.2.2STRUCTURAL SVM S15.2.3Multiclass SVMsSVMs are inherently two-class classifiers.

The traditional way to do multiclass classification with SVMs is to use one of the methods discussed inSection 14.5 (page 306). In particular, the most common technique in practice has been to build |C | one-versus-rest classifiers (commonly referred to as“one-versus-all” or OVA classification), and to choose the class which classifies the test datum with greatest margin.

Характеристики

Тип файла

PDF-файл

Размер

6,58 Mb

Материал

An introduction to information retrieval. Manning_ Raghavan (2009).pdf

Тип материала

Книга

Предмет

Анализ текстовых данных и информационный поиск

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

an-introduction-to-information-retrieval.-manning_-raghavan-2009.pdf.rar

An introduction to information retrieval. Manning_ Raghavan (2009).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.