An introduction to information retrieval. Manning_ Raghavan (2009) (811397), страница 39

Файл №811397 An introduction to information retrieval. Manning_ Raghavan (2009) (An introduction to information retrieval. Manning_ Raghavan (2009).pdf) 39 страницаAn introduction to information retrieval. Manning_ Raghavan (2009) (811397) страница 392020-08-252020-08-25СтудИзба

An introduction to information retrieval. Manning_ Raghavan (2009).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 39)

The U.S. National Institute of Standardsand Technology (NIST) has run a large IR test bed evaluation series since1992. Within this framework, there have been many tracks over a rangeof different test collections, but the best known test collections are theones used for the TREC Ad Hoc track during the first 8 TREC evaluationsbetween 1992 and 1999.

In total, these test collections comprise 6 CDscontaining 1.89 million documents (mainly, but not exclusively, newswirearticles) and relevance judgments for 450 information needs, which arecalled topics and specified in detailed text passages. Individual test collections are defined over different subsets of this data. The early TRECseach consisted of 50 information needs, evaluated over different but overlapping sets of documents. TRECs 6–8 provide 150 information needsover about 528,000 newswire and Foreign Broadcast Information Servicearticles. This is probably the best subcollection to use in future work, because it is the largest and the topics are more consistent.

Because the testOnline edition (c) 2009 Cambridge UP1548 Evaluation in information retrievaldocument collections are so large, there are no exhaustive relevance judgments. Rather, NIST assessors’ relevance judgments are available only forthe documents that were among the top k returned for some system whichwas entered in the TREC evaluation for which the information need wasdeveloped.GOV2NTCIRCROSS - LANGUAGEINFORMATIONRETRIEVALIn more recent years, NIST has done evaluations on larger document collections, including the 25 million page GOV2 web page collection. Fromthe beginning, the NIST test document collections were orders of magnitude larger than anything available to researchers previously and GOV2is now the largest Web collection easily available for research purposes.Nevertheless, the size of GOV2 is still more than 2 orders of magnitudesmaller than the current size of the document collections indexed by thelarge web search companies.NII Test Collections for IR Systems (NTCIR).

The NTCIR project has builtvarious test collections of similar sizes to the TREC collections, focusing on East Asian language and cross-language information retrieval, wherequeries are made in one language over a document collection containingdocuments in one or more other languages. See: http://research.nii.ac.jp/ntcir/data/dataen.htmlCLEFCross Language Evaluation Forum (CLEF). This evaluation series has concentrated on European languages and cross-language information retrieval.See: http://www.clef-campaign.org/R EUTERSReuters-21578 and Reuters-RCV1. For text classification, the most used testcollection has been the Reuters-21578 collection of 21578 newswire articles; see Chapter 13, page 279. More recently, Reuters released the muchlarger Reuters Corpus Volume 1 (RCV1), consisting of 806,791 documents;see Chapter 4, page 69.

Its scale and rich annotation makes it a better basisfor future research.20 N EWSGROUPS20 Newsgroups. This is another widely used text classification collection,collected by Ken Lang. It consists of 1000 articles from each of 20 Usenetnewsgroups (the newsgroup name being regarded as the category). Afterthe removal of duplicate articles, as it is usually used, it contains 18941articles.8.3Evaluation of unranked retrieval setsGiven these ingredients, how is system effectiveness measured? The twomost frequent and basic measures for information retrieval effectiveness areprecision and recall.

These are first defined for the simple case where anOnline edition (c) 2009 Cambridge UP1558.3 Evaluation of unranked retrieval setsIR system returns a set of documents for a query. We will see later how toextend these notions to ranked retrieval situations.PRECISIONPrecision (P) is the fraction of retrieved documents that are relevantPrecision =(8.1)RECALL#(relevant items retrieved)= P(relevant|retrieved)#(retrieved items)Recall (R) is the fraction of relevant documents that are retrieved(8.2)Recall =#(relevant items retrieved)= P(retrieved|relevant)#(relevant items)These notions can be made clear by examining the following contingencytable:(8.3)RetrievedNot retrievedRelevanttrue positives (tp)false negatives (fn)Nonrelevantfalse positives (fp)true negatives (tn)Then:(8.4)PRACCURACY= tp/(tp + f p)= tp/(tp + f n)An obvious alternative that may occur to the reader is to judge an information retrieval system by its accuracy, that is, the fraction of its classifications that are correct.

In terms of the contingency table above, accuracy =(tp + tn)/(tp + f p + f n + tn). This seems plausible, since there are two actual classes, relevant and nonrelevant, and an information retrieval systemcan be thought of as a two-class classifier which attempts to label them assuch (it retrieves the subset of documents which it believes to be relevant).This is precisely the effectiveness measure often used for evaluating machinelearning classification problems.There is a good reason why accuracy is not an appropriate measure forinformation retrieval problems.

In almost all circumstances, the data is extremely skewed: normally over 99.9% of the documents are in the nonrelevant category. A system tuned to maximize accuracy can appear to performwell by simply deeming all documents nonrelevant to all queries. Even if thesystem is quite good, trying to label some documents as relevant will almostalways lead to a high rate of false positives.

However, labeling all documentsas nonrelevant is completely unsatisfying to an information retrieval systemuser. Users are always going to want to see some documents, and can beOnline edition (c) 2009 Cambridge UP1568 Evaluation in information retrievalFMEASURE(8.5)assumed to have a certain tolerance for seeing some false positives providing that they get some useful information. The measures of precision andrecall concentrate the evaluation on the return of true positives, asking whatpercentage of the relevant documents have been found and how many falsepositives have also been returned.The advantage of having the two numbers for precision and recall is thatone is more important than the other in many circumstances.

Typical websurfers would like every result on the first page to be relevant (high precision) but have not the slightest interest in knowing let alone looking at everydocument that is relevant. In contrast, various professional searchers such asparalegals and intelligence analysts are very concerned with trying to get ashigh recall as possible, and will tolerate fairly low precision results in order toget it. Individuals searching their hard disks are also often interested in highrecall searches. Nevertheless, the two quantities clearly trade off against oneanother: you can always get a recall of 1 (but very low precision) by retrieving all documents for all queries! Recall is a non-decreasing function of thenumber of documents retrieved.

On the other hand, in a good system, precision usually decreases as the number of documents retrieved is increased. Ingeneral we want to get some amount of recall while tolerating only a certainpercentage of false positives.A single measure that trades off precision versus recall is the F measure,which is the weighted harmonic mean of precision and recall:F=1( β2 + 1) PR=β2 P + Rα P1 + (1 − α) R1whereβ2 =1−ααwhere α ∈ [0, 1] and thus β2 ∈ [0, ∞].

The default balanced F measure equallyweights precision and recall, which means making α = 1/2 or β = 1. It iscommonly written as F1 , which is short for Fβ=1 , even though the formulation in terms of α more transparently exhibits the F measure as a weightedharmonic mean. When using β = 1, the formula on the right simplifies to:(8.6)Fβ=1 =2PRP+RHowever, using an even weighting is not the only choice.

Values of β < 1emphasize precision, while values of β > 1 emphasize recall. For example, avalue of β = 3 or β = 5 might be used if recall is to be emphasized. Recall,precision, and the F measure are inherently measures between 0 and 1, butthey are also very commonly written as percentages, on a scale between 0and 100.Why do we use a harmonic mean rather than the simpler average (arithmetic mean)? Recall that we can always get 100% recall by just returning alldocuments, and therefore we can always get a 50% arithmetic mean by theOnline edition (c) 2009 Cambridge UP1578.3 Evaluation of unranked retrieval sets100806MiManixmuimmum0AriGH4020theamoremmteotnicriicc002Pre0c4ision(Recallfix0e6dat70%080100)◮ Figure 8.1 Graph comparing the harmonic mean to other means. The graphshows a slice through the calculation of various means of precision and recall forthe fixed recall value of 70%.

The harmonic mean is always less than either the arithmetic or geometric mean, and often quite close to the minimum of the two numbers.When the precision is also 70%, all the measures coincide.same process. This strongly suggests that the arithmetic mean is an unsuitable measure to use. In contrast, if we assume that 1 document in 10,000 isrelevant to the query, the harmonic mean score of this strategy is 0.02%. Theharmonic mean is always less than or equal to the arithmetic mean and thegeometric mean.

When the values of two numbers differ greatly, the harmonic mean is closer to their minimum than to their arithmetic mean; seeFigure 8.1.?Exercise 8.1[ ⋆]An IR system returns 8 relevant documents, and 10 nonrelevant documents. Thereare a total of 20 relevant documents in the collection. What is the precision of thesystem on this search, and what is its recall?Exercise 8.2[ ⋆]The balanced F measure (a.k.a. F1 ) is defined as the harmonic mean of precision andrecall. What is the advantage of using the harmonic mean rather than “averaging”(using the arithmetic mean)?Online edition (c) 2009 Cambridge UP1588 Evaluation in information retrieval1.0Precision0.80.60.40.20.00.00.20.40.60.81.0Recall◮ Figure 8.2 Precision/recall graph.Exercise 8.3[⋆⋆]Derive the equivalence between the two formulas for F measure shown in Equation (8.5), given that α = 1/( β2 + 1).8.4PRECISION - RECALLCURVEINTERPOLATEDPRECISIONEvaluation of ranked retrieval resultsPrecision, recall, and the F measure are set-based measures.

Характеристики

Тип файла

PDF-файл

Размер

6,58 Mb

Материал

An introduction to information retrieval. Manning_ Raghavan (2009).pdf

Тип материала

Книга

Предмет

Анализ текстовых данных и информационный поиск

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

an-introduction-to-information-retrieval.-manning_-raghavan-2009.pdf.rar

An introduction to information retrieval. Manning_ Raghavan (2009).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.