Главная » Просмотр файлов » An introduction to information retrieval. Manning_ Raghavan (2009)

An introduction to information retrieval. Manning_ Raghavan (2009) (811397), страница 42

Файл №811397 An introduction to information retrieval. Manning_ Raghavan (2009) (An introduction to information retrieval. Manning_ Raghavan (2009).pdf) 42 страницаAn introduction to information retrieval. Manning_ Raghavan (2009) (811397) страница 422020-08-25СтудИзба
Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Текст из файла (страница 42)

In such circumstances,marginal relevance is clearly a better measure of utility to the user. Maximizing marginal relevance requires returning documents that exhibit diversityand novelty. One way to approach measuring this is by using distinct factsor entities as evaluation units. This perhaps more directly measures trueutility to the user but doing this makes it harder to create a test collection.Exercise 8.10[⋆⋆]Below is a table showing how two human judges rated the relevance of a set of 12documents to a particular information need (0 = nonrelevant, 1 = relevant).

Let us assume that you’ve written an IR system that for this query returns the set of documents{4, 5, 6, 7, 8}.Online edition (c) 2009 Cambridge UP1688 Evaluation in information retrievaldocID123456789101112Judge 1001111110000Judge 2001100001111a. Calculate the kappa measure between the two judges.b. Calculate precision, recall, and F1 of your system if a document is considered relevant only if the two judges agree.c.

Calculate precision, recall, and F1 of your system if a document is considered relevant if either judge thinks it is relevant.8.6A broader perspective: System quality and user utilityFormal evaluation measures are at some distance from our ultimate interestin measures of human utility: how satisfied is each user with the results thesystem gives for each information need that they pose? The standard way tomeasure human satisfaction is by various kinds of user studies. These mightinclude quantitative measures, both objective, such as time to complete atask, as well as subjective, such as a score for satisfaction with the searchengine, and qualitative measures, such as user comments on the search interface.

In this section we will touch on other system aspects that allow quantitative evaluation and the issue of user utility.8.6.1System issuesThere are many practical benchmarks on which to rate an information retrieval system beyond its retrieval quality. These include:• How fast does it index, that is, how many documents per hour does itindex for a certain distribution over document lengths? (cf. Chapter 4)• How fast does it search, that is, what is its latency as a function of indexsize?• How expressive is its query language? How fast is it on complex queries?Online edition (c) 2009 Cambridge UP8.6 A broader perspective: System quality and user utility169• How large is its document collection, in terms of the number of documents or the collection having information distributed across a broadrange of topics?All these criteria apart from query language expressiveness are straightforwardly measurable: we can quantify the speed or size.

Various kinds of feature checklists can make query language expressiveness semi-precise.8.6.2User utilityWhat we would really like is a way of quantifying aggregate user happiness,based on the relevance, speed, and user interface of a system. One part ofthis is understanding the distribution of people we wish to make happy, andthis depends entirely on the setting. For a web search engine, happy searchusers are those who find what they want. One indirect measure of such usersis that they tend to return to the same engine.

Measuring the rate of returnof users is thus an effective metric, which would of course be more effectiveif you could also measure how much these users used other search engines.But advertisers are also users of modern web search engines. They are happyif customers click through to their sites and then make purchases.

On aneCommerce web site, a user is likely to be wanting to purchase something.Thus, we can measure the time to purchase, or the fraction of searchers whobecome buyers. On a shopfront web site, perhaps both the user’s and thestore owner’s needs are satisfied if a purchase is made. Nevertheless, ingeneral, we need to decide whether it is the end user’s or the eCommercesite owner’s happiness that we are trying to optimize. Usually, it is the storeowner who is paying us.For an “enterprise” (company, government, or academic) intranet searchengine, the relevant metric is more likely to be user productivity: how muchtime do users spend looking for information that they need.

There are alsomany other practical criteria concerning such matters as information security, which we mentioned in Section 4.6 (page 80).User happiness is elusive to measure, and this is part of why the standardmethodology uses the proxy of relevance of search results. The standarddirect way to get at user satisfaction is to run user studies, where people engage in tasks, and usually various metrics are measured, the participants areobserved, and ethnographic interview techniques are used to get qualitativeinformation on satisfaction. User studies are very useful in system design,but they are time consuming and expensive to do. They are also difficult todo well, and expertise is required to design the studies and to interpret theresults.

We will not discuss the details of human usability testing here.Online edition (c) 2009 Cambridge UP1708 Evaluation in information retrieval8.6.3A/B TESTCLICKTHROUGH LOGANALYSISCLICKSTREAM MINING8.7SNIPPETRefining a deployed systemIf an IR system has been built and is being used by a large number of users,the system’s builders can evaluate possible changes by deploying variantversions of the system and recording measures that are indicative of usersatisfaction with one variant vs. others as they are being used.

This methodis frequently used by web search engines.The most common version of this is A/B testing, a term borrowed from theadvertising industry. For such a test, precisely one thing is changed betweenthe current system and a proposed system, and a small proportion of traffic (say, 1–10% of users) is randomly directed to the variant system, whilemost users use the current system. For example, if we wish to investigate achange to the ranking algorithm, we redirect a random sample of users toa variant system and evaluate measures such as the frequency with whichpeople click on the top result, or any result on the first page. (This particularanalysis method is referred to as clickthrough log analysis or clickstream mining. It is further discussed as a method of implicit feedback in Section 9.1.7(page 187).)The basis of A/B testing is running a bunch of single variable tests (eitherin sequence or in parallel): for each test only one parameter is varied from thecontrol (the current live system).

It is therefore easy to see whether varyingeach parameter has a positive or negative effect. Such testing of a live systemcan easily and cheaply gauge the effect of a change on users, and, with alarge enough user base, it is practical to measure even very small positiveand negative effects. In principle, more analytic power can be achieved byvarying multiple things at once in an uncorrelated (random) way, and doingstandard multivariate statistical analysis, such as multiple linear regression.In practice, though, A/B testing is widely used, because A/B tests are easyto deploy, easy to understand, and easy to explain to management.Results snippetsHaving chosen or ranked the documents matching a query, we wish to present a results list that will be informative to the user.

In many cases theuser will not want to examine all the returned documents and so we wantto make the results list informative enough that the user can do a final ranking of the documents for themselves based on relevance to their informationneed.3 The standard way of doing this is to provide a snippet, a short summary of the document, which is designed so as to allow the user to decideits relevance.

Typically, the snippet consists of the document title and a short3. There are exceptions, in domains where recall is emphasized. For instance, in many legaldisclosure cases, a legal associate will review every document that matches a keyword search.Online edition (c) 2009 Cambridge UP8.7 Results snippetsSTATIC SUMMARYDYNAMIC SUMMARYTEXT SUMMARIZATIONKEYWORD - IN - CONTEXT171summary, which is automatically extracted. The question is how to designthe summary so as to maximize its usefulness to the user.The two basic kinds of summaries are static, which are always the sameregardless of the query, and dynamic (or query-dependent), which are customized according to the user’s information need as deduced from a query.Dynamic summaries attempt to explain why a particular document was retrieved for the query at hand.A static summary is generally comprised of either or both a subset of thedocument and metadata associated with the document. The simplest formof summary takes the first two sentences or 50 words of a document, or extracts particular zones of a document, such as the title and author.

Instead ofzones of a document, the summary can instead use metadata associated withthe document. This may be an alternative way to provide an author or date,or may include elements which are designed to give a summary, such as thedescription metadata which can appear in the meta element of a webHTML page. This summary is typically extracted and cached at indexingtime, in such a way that it can be retrieved and presented quickly when displaying search results, whereas having to access the actual document contentmight be a relatively expensive operation.There has been extensive work within natural language processing (NLP)on better ways to do text summarization. Most such work still aims only tochoose sentences from the original document to present and concentrates onhow to select good sentences.

Характеристики

Тип файла
PDF-файл
Размер
6,58 Mb
Тип материала
Высшее учебное заведение

Список файлов книги

Свежие статьи
Популярно сейчас
Почему делать на заказ в разы дороже, чем купить готовую учебную работу на СтудИзбе? Наши учебные работы продаются каждый год, тогда как большинство заказов выполняются с нуля. Найдите подходящий учебный материал на СтудИзбе!
Ответы на популярные вопросы
Да! Наши авторы собирают и выкладывают те работы, которые сдаются в Вашем учебном заведении ежегодно и уже проверены преподавателями.
Да! У нас любой человек может выложить любую учебную работу и зарабатывать на её продажах! Но каждый учебный материал публикуется только после тщательной проверки администрацией.
Вернём деньги! А если быть более точными, то автору даётся немного времени на исправление, а если не исправит или выйдет время, то вернём деньги в полном объёме!
Да! На равне с готовыми студенческими работами у нас продаются услуги. Цены на услуги видны сразу, то есть Вам нужно только указать параметры и сразу можно оплачивать.
Отзывы студентов
Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.
Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.
Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.
Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.
Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.
Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.
Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.
Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.
Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.
Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.
Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.
Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.
Популярные преподаватели
Добавляйте материалы
и зарабатывайте!
Продажи идут автоматически
6374
Авторов
на СтудИзбе
309
Средний доход
с одного платного файла
Обучение Подробнее