ulanovav (1185425)

Файл №1185425 ulanovav (Аннотации)ulanovav (1185425)2020-08-252020-08-25СтудИзба

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла

Контекстно-зависимый переводсловаря оценочных слов припомощи параллельных текстовУланов А. В. (alexander.ulanov@hp.com),Сапожников Г. А. (gsapozhnikov@gmail.com)Hewlett-Packard Labs Russia, Санкт-Петербургскийгосударственный университет, Санкт-Петербург, РоссияКлючевые слова: анализ мнений, оценочные слова, машинныйперевод, классификацияContext-dependent opinionlexicon translation withthe use of a parallel corpusUlanov A.

V. (alexander.ulanov@hp.com),Sapozhnikov G. A. (gsapozhnikov@gmail.com)Hewlett-Packard Labs Russia, St. Petersburg State University,St. Petersburg, RussiaKeywords: opinion mining, sentiment analysis, opinion words, machinetranslationUlanov A. V., Sapozhnikov G. A.1. IntroductionSentiment analysis is one of the most popular information extraction tasks bothfrom business and research prospective. It has numerous business applications, suchas evaluation of a product or company perception in social media. From the standpoint of research, sentiment analysis relies on the methods developed for natural language processing and information extraction.

One of the key aspects of it is the opinion word lexicon. Opinion words are such words that carry opinion. Positive wordsrefer to some desired state, while negative words — to some undesired one. For example, “good” and “beautiful” are positive opinion words, “bad” and “evil” are negative.Opinion phrases and idioms exist as well. Many opinion words depend on context, likethe word “large”. Some opinion phrases are comparative rather than opinionated, forexample “better than”.

Auxiliary words like negation can change sentiment orientation of a word.Opinion words are used in a number of sentiment analysis tasks. They includedocument and sentence sentiment classification, product features extraction, subjectivity detection etc. [12]. Opinion words are used as features in sentiment classification. Sentiment orientation of a product feature is usually computed based on the sentiment orientation of opinion words nearby. Product features can be extracted withthe help of phrase or dependency patterns that include opinion words and placeholders for product features themselves. Subjectivity detection highly relies on opinionword lists as well, because many opinionated phrases are subjective [14].

Thus, opinion lexicon generation is an important sentiment analysis task. Detection of opinionword sentiment orientation is an accompanying task.Opinion lexicon generation task can be solved in several ways. The authorsof [12] point out three approaches: manual, dictionary-based and corpus-based.

Themanual approach is precise but time-consuming. The dictionary based approach relies on dictionaries such as WordNet. One starts from a small collection of opinionwords and looks for their synonyms and antonyms in a dictionary [10]. The drawback of this approach is that the dictionary coverage is limited and it is hard to createa domain-specific opinion word list. Corpus-based approaches rely on mining a review corpus and use methods employed in information extraction. The approach proposed in [9] is based on a seed list of opinion words. These words are used togetherwith some linguistic constraints like “AND” or “OR” to mine additional opinion words.Clustering is performed to label the mined words in the list as positive and negative.Part of speech patterns are used to populate the opinion word dictionary in [21] andInternet search statistics is used to detect semantic orientation of a word.

Work [7]extends the mentioned approaches and introduces a method for extraction of contextbased opinion words together with their orientation. Classification techniques areused in [2] to filter out opinion words from text. The approaches described were applied in English. There are some works that deal with Russian. For example, paper [4]proposes to use classification. Various features, such as word frequency, weirdness,and TF-IDF are used there.Most of the research done in the field of sentiment analysis relies on the presence of annotated resources for a given language. However, there are methodsContext-dependent opinion lexicon translation with the use of a parallel corpuswhich automatically generate resources for a target language, given that there aretools and resources available in the source language.

Different approaches to multilingual subjectivity analysis are studied in [14] and [1] and are summarized in [3].In one of them, subjectivity lexicon in the source language is translated with the useof a dictionary and employed for subjectivity classification. This approach deliversmediocre precision due to the use of the first translation option and due to wordlemmatization.

Another approach suggests translating the corpus. This can be donein three different ways: translating an annotated corpus in the source language andprojecting its labels; automatic annotation of the corpus, translating it and projecting the labels; translating the corpus in the target language, automatic annotationof it and projecting the labels.

Language Weaver1 machine translation was usedon English-Roman and English-Spanish data [3]. Classification experiments withthe produced corpora showed similar results. They are close to the case when testdata is translated and annotated automatically.

This shows that machine translationsystems are good enough for translating opinionated datasets. It is also confirmedby the authors of [19] when they used Google Translate2, Microsoft Bing Translator3and Moses4.Multilingual opinion lexicon generation is considered in the recent paper [19]that presents a semi-automatic approach with the use of triangulation. The authorsuse high-quality lexicons in two different languages and then translate them automatically into a third language with Google Translate.

The words that are found in bothtranslations are supposed to have good precision. It was proven for several languagesincluding Russian with the manual check of the resulting lists. The same authors collect and examine entity-centered sentiment annotated parallel corpora [20].In this paper we develop the idea of multilingual sentiment analysis. We proposea method for projecting an opinion lexicon from a source language to a target languagewith the use of a parallel corpus. We apply it to the language pair English-Russian having a collection of a parallel and a pseudo-parallel review corpora.

The method is evaluated against the baseline, which is a translation of the opinion word lexicon withGoolge Translate. Sentiment classification experiments are conducted to evaluate thequality of the lexicons. The advantages of our method are the following. It capturesthe context of opinion words thus producing correct translations. It doesn’t requirea machine translation tool, as in [19] or a bilingual dictionary as in [14]. However,machine translation tool may be employed in the absence of parallel corpus or forbetter recall. The opinion lexicon is needed only in one language, unlike in work [19]where 2 lexicons are required.1http://www.sdl.com/products/automated-translation/2http://translate.google.com/3http://www.bing.com/translator4http://www.statmt.org/moses/Ulanov A.

V., Sapozhnikov G. A.2. ApproachThe idea of our approach is to use a parallel corpus to construct an opinion lexicon in a target language, given that there is an opinion lexicon in a source language.A parallel corpus is a text with its translation to the target language. We suppose thatit contains opinionated sentences. An opinion lexicon is a set of words carrying opinion. It is not necessarily divided into positive/negative or other groups.

The opinionlexicon for the target language is extracted from the parallel corpus by translating thewords from the opinion lexicon in the source language. The algorithm of the methodis as follows:1. Collect a corpus of parallel reviews, align sentences2. Compute word lexical translation probabilities3. Collect opinion words translations and normalize themLet us consider the mentioned steps in greater details. The task of parallel corpus acquisition and preparation is a well-studied area of research [8]. One collectsor crawls data that is available in different languages.

Parallel documents are determined by some identifier, e.g. name, time, or specific number. Documents are splitinto sentences by the sentence splitter, paragraphs are kept preserved. The resultingtext is processed by the sentence aligner. A parallel corpus with opinionated texts canbe obtained from the sites that post reviews in different languages (manually translated). Usually, such reviews are editorial.

They contain opinionated text; howeveropinion words there tend to be more polite than in forums or user reviews. The sizeof the corpus is less important than the coverage of words from the source opinion lexicon. In the absence of a natural parallel corpus, a pseudo-parallel corpus can be used[20], which is a text along with its translation done by an automatic translation system.Lexical translation probabilities of words are computed on the aligned corpus:() and ()(),()language, s is a word in the source language. Lexical →→ where t is a word in the target () To compute it, one hasto() () ()translation is a translation of a word in isolation. count howmany →→ .. times a certain word was translated into different options within the aligned () → sentences.

Характеристики

Тип файла

PDF-файл

Размер

511,16 Kb

Материал

Аннотации

Тип материала

Другое

Предмет

Английский язык

Высшее учебное заведение

МГУ им. Ломоносова

Тип файла PDF

PDF-формат наиболее широко используется для просмотра любого типа файлов на любом устройстве. В него можно сохранить документ, таблицы, презентацию, текст, чертежи, вычисления, графики и всё остальное, что можно показать на экране любого устройства. Именно его лучше всего использовать для печати.

Например, если Вам нужно распечатать чертёж из автокада, Вы сохраните чертёж на флешку, но будет ли автокад в пункте печати? А если будет, то нужная версия с нужными библиотеками? Именно для этого и нужен формат PDF - в нём точно будет показано верно вне зависимости от того, в какой программе создали PDF-файл и есть ли нужная программа для его просмотра.

Список файлов учебной работы

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.