An introduction to information retrieval. Manning_ Raghavan (2009) (811397), страница 102

Файл №811397 An introduction to information retrieval. Manning_ Raghavan (2009) (An introduction to information retrieval. Manning_ Raghavan (2009).pdf) 102 страницаAn introduction to information retrieval. Manning_ Raghavan (2009) (811397) страница 1022020-08-252020-08-25СтудИзба

An introduction to information retrieval. Manning_ Raghavan (2009).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 102)

Here the query A320returns algorithmic search results about the Airbus aircraft, together with advertisements for various non-aircraft goods numbered A320, that advertisers seek to marketto those querying on this query. The lack of advertisements for the aircraft reflects thefact that few marketers attempt to sell A320 aircraft on the web.SEARCH ENGINEMARKETINGCLICK SPAM?retrieval and microeconomics, and is beyond the scope of this book. Foradvertisers, understanding how search engines do this ranking and how toallocate marketing campaign budgets to different keywords and to differentsponsored search engines has become a profession known as search enginemarketing (SEM).The inherently economic motives underlying sponsored search give riseto attempts by some participants to subvert the system to their advantage.This can take many forms, one of which is known as click spam.

There iscurrently no universally accepted definition of click spam. It refers (as thename suggests) to clicks on sponsored search results that are not from bonafide search users. For instance, a devious advertiser may attempt to exhaustthe advertising budget of a competitor by clicking repeatedly (through theuse of a robotic click generator) on that competitor’s sponsored search advertisements.

Search engines face the challenge of discerning which of theclicks they observe are part of a pattern of click spam, to avoid charging theiradvertiser clients for such clicks.Exercise 19.5The Goto method ranked advertisements matching a query by bid: the highest-biddingadvertiser got the top position, the second-highest the next, and so on.

What can gowrong with this when the highest-bidding advertiser places an advertisement that isirrelevant to the query? Why might an advertiser with an irrelevant advertisementbid high in this manner?Exercise 19.6Suppose that, in addition to bids, we had for each advertiser their click-through rate:the ratio of the historical number of times users click on their advertisement to thenumber of times the advertisement was shown. Suggest a modification of the Gotoscheme that exploits this data to avoid the problem in Exercise 19.5 above.Online edition (c) 2009 Cambridge UP43219 Web search basics19.4The search user experienceIt is crucial that we understand the users of web search as well.

This isagain a significant change from traditional information retrieval, where userswere typically professionals with at least some training in the art of phrasingqueries over a well-authored collection whose style and structure they understood well. In contrast, web search users tend to not know (or care) aboutthe heterogeneity of web content, the syntax of query languages and the artof phrasing queries; indeed, a mainstream tool (as web search has come tobecome) should not place such onerous demands on billions of people.

Arange of studies has concluded that the average number of keywords in aweb search is somewhere between 2 and 3. Syntax operators (Boolean connectives, wildcards, etc.) are seldom used, again a result of the compositionof the audience – “normal” people, not information scientists.It is clear that the more user traffic a web search engine can attract, themore revenue it stands to earn from sponsored search.

How do search engines differentiate themselves and grow their traffic? Here Google identifiedtwo principles that helped it grow at the expense of its competitors: (1) afocus on relevance, specifically precision rather than recall in the first few results; (2) a user experience that is lightweight, meaning that both the searchquery page and the search results page are uncluttered and almost entirelytextual, with very few graphical elements. The effect of the first was simplyto save users time in locating the information they sought.

The effect of thesecond is to provide a user experience that is extremely responsive, or at anyrate not bottlenecked by the time to load the search query or results page.19.4.1INFORMATIONALQUERIESNAVIGATIONALQUERIESUser query needsThere appear to be three broad categories into which common web searchqueries can be grouped: (i) informational, (ii) navigational and (iii) transactional. We now explain these categories; it should be clear that some querieswill fall in more than one of these categories, while others will fall outsidethem.Informational queries seek general information on a broad topic, such asleukemia or Provence. There is typically not a single web page that contains all the information sought; indeed, users with informational queriestypically try to assimilate information from multiple web pages.Navigational queries seek the website or home page of a single entity that theuser has in mind, say Lufthansa airlines.

In such cases, the user’s expectationis that the very first search result should be the home page of Lufthansa.The user is not interested in a plethora of documents containing the termLufthansa; for such a user, the best measure of user satisfaction is precision at1.Online edition (c) 2009 Cambridge UP19.5 Index size and estimationTRANSACTIONALQUERY19.5433A transactional query is one that is a prelude to the user performing a transaction on the Web – such as purchasing a product, downloading a file ormaking a reservation. In such cases, the search engine should return resultslisting services that provide form interfaces for such transactions.Discerning which of these categories a query falls into can be challenging.

The category not only governs the algorithmic search results, but thesuitability of the query for sponsored search results (since the query may reveal an intent to purchase). For navigational queries, some have argued thatthe search engine should return only a single result or even the target webpage directly. Nevertheless, web search engines have historically engaged ina battle of bragging rights over which one indexes more web pages.

Doesthe user really care? Perhaps not, but the media does highlight estimates(often statistically indefensible) of the sizes of various search engines. Usersare influenced by these reports and thus, search engines do have to pay attention to how their index sizes compare to competitors’. For informational(and to a lesser extent, transactional) queries, the user does care about thecomprehensiveness of the search engine.Figure 19.7 shows a composite picture of a web search engine includingthe crawler, as well as both the web page and advertisement indexes. Theportion of the figure under the curved dashed line is internal to the searchengine.Index size and estimationTo a first approximation, comprehensiveness grows with index size, althoughit does matter which specific pages a search engine indexes – some pages aremore informative than others. It is also difficult to reason about the fractionof the Web indexed by a search engine, because there is an infinite number ofdynamic web pages; for instance, http://www.yahoo.com/any_stringreturns a valid HTML page rather than an error, politely informing the userthat there is no such page at Yahoo! Such a "soft 404 error" is only one example of many ways in which web servers can generate an infinite number ofvalid web pages.

Indeed, some of these are malicious spider traps devisedto cause a search engine’s<b>Текст обрезан, так как является слишком большим</b>.

Характеристики

Тип файла

PDF-файл

Размер

6,58 Mb

Материал

An introduction to information retrieval. Manning_ Raghavan (2009).pdf

Тип материала

Книга

Предмет

Анализ текстовых данных и информационный поиск

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

an-introduction-to-information-retrieval.-manning_-raghavan-2009.pdf.rar

An introduction to information retrieval. Manning_ Raghavan (2009).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.