kotelnikovevetal (Аннотации)

PDF-файл kotelnikovevetal (Аннотации), который располагается в категории "разное" в предмете "английский язык" издесятого семестра. kotelnikovevetal (Аннотации) - СтудИзба 2020-08-25 СтудИзба

Описание файла

Файл "kotelnikovevetal" внутри архива находится в следующих папках: Аннотации, 1. PDF-файл из архива "Аннотации", который расположен в категории "разное". Всё это находится в предмете "английский язык" из десятого семестра, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .

Просмотр PDF-файла онлайн

Текст из PDF

Kotelnikov E. V. et al.Словари оценочной лексики,созданные вручную:разработка и исследованиеКотельников Е. В. (kotelnikov.ev@gmail.com),Бушмелева Н. А. (bushmeleva_na@list.ru),Разова Е. В. (razova.ev@gmail.com),Пескишева Т. А. (peskisheva.t@mail.ru),Плетнева М. В. (pletneva.mv.kirov@gmail.com)Вятский государственный университет, Киров, РоссияКлючевые слова: анализ тональности, словари оценочной лексики,экспертный подход, подход на основе корпусов, метод опорных векторов1.

IntroductionIn recent years the sentiment analysis is one of the hottest research areas in natu‑ral language processing (Liu, 2012). The challenges to the researchers are both theo‑retical aspects, such as the objective laws of the sentiment expressions in the naturallanguage, and the practical aspects, e.

g., the analysis of consumer products and ser‑vices reviews, the monitoring of social networks, the political studies (Feldman, 2013).There are two main approaches to the sentiment analysis (Taboada et al., 2011):lexicon‑based and machine learning. The first of them determines the text sentimentby means of individual words polarity in the text. The latter considers the task of sen‑timent analysis as the problem of text categorization.

Both approaches require highquality sentiment lexicons: even in the text categorization methods the word weightsare often proportional to word polarity and strength.There are many studies on the problem of sentiment lexicons creating. They gen‑erally use three main approaches (Liu, 2012): manual approach, dictionary-based ap‑proach, and corpus-based approach.In the manual approach the sentiment lexicons are constructed by human an‑notators. In the dictionary-based approach the sentiment lexicons are created withthe help of the universal dictionaries and thesauri, e. g., WordNet (Fellbaum, 1998).In the corpus-based approach the sentiment lexicons are built based on the analysisof text corpora.

Also the various hybrid combinations of these approaches are used.Though the problem of sentiment lexicons creation is very important, little at‑tention is paid to the evaluation of the quality and in-depth analysis of the generatedlexicons, especially for Russian.In this paper, firstly, we propose a procedure of creating the sentiment lexicon fora given domain, secondly, we analyze the sentiment lexicon that is constructed by sev‑eral annotators for various domains, thirdly, we research the performance of thesesentiment lexicons in comparison with existing lexicons.Manually Created Sentiment Lexicons: Research and DevelopmentThe rest of the paper considers the related work (Section 2) and the used textcorpora (Section 3) are considered. At first the corpus-based approach to sentimentwords extraction is applied to generate the sentiment lexicons, then their manual an‑notation is carried out by several annotators (Section 4).

The generated lexicons arejointly analyzed (Section 5). Performance of the sentiment analysis based on the sen‑timent lexicons and Support Vector Machine (SVM) is evaluated (Section 6).2. Related work2.1.The creation of sentiment lexiconsTwo stages of lexicons creation can be distinguished: 1) the generation of thesentiment-bearing words list, containing the candidates to sentiment lexicon, and2) the assignment of sentiment labels to these words, e.

g. positive/negative/neutral.Both stages are performed either manually or automatically.Most of the studies on concerning sentiment lexicons creation are carried outon the material of English. For example, Taboada et al. (2011) both stages fulfilledmanually. Mohammad and Turney (2013) used the crowdsourcing for the creationof word-emotion and word-polarity association lexicon.There are also studies for other languages. For example, Amiri et al. (2015)formed word list manually, then this list was annotated by several human annotatorsby means of web interface.There are few such studies for Russian.

Chetviorkin and Loukachevitch (2012)extracted and weighted sentiment words automatically on the base of machine learn‑ing. Manual annotation was performed only for evaluation. Ulanov and Sapozhnikov(2013) built up the lexicons by means of automatic translation of English dictionaries.Blinov and Kotelnikov (2014) created the sentiment lexicon based on the distributedrepresentations of words and used it for aspect-based sentiment analysis. Ivanov et al.(2015) applied the corpus-based approach in the user review domain as well as foraspect-based sentiment analysis.At present the following sentiment lexicons are publicly available:• Russian Sentiment Lexicon for Product Meta-Domain (ProductSentiRus)—5,000words (Chetviorkin, Loukachevitch, 2012)1;• NRC Emotion Lexicon translated in Russian via Google Translate (NRC)—4,590words (Mohammad, Turney, 2013)2;• Russian sentiment lexicon—2,914 words (Chen, Skiena, 2014)3;• Sentiment lexicon for restaurants domain—7,312 words (including bigrams andtrigrams) (Blinov, Kotelnikov, 2014)4.1http://www.cir.ru/SentiLexicon/ProductSentiRus.txt2http://www.saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm3https://sites.google.com/site/datascienceslab/projects/multilingualsentiment4http://goo.gl/NhEvWuKotelnikov E.

V. et al.These lexicons (except the latter, containing the large part of collocations) areused in our study to compare with the manual lexicons (see Section 6).2.2.Analysis of lexiconsOne of the main purposes of our study is a joint analysis of the word list sentimentlabeling. The word list was made by several human annotators for various domains.To our knowledge such in-depth analysis of Russian sentiment lexicons hasn’t beenperformed yet.Andreevskaia and Bergler (2006) conducted simultaneous labeling of two senti‑ment lexicons by two teams, which resulted in the high degree of disagreement.Taboada et al. (2011) compared manual lexicons with dictionaries built usingAmazon Mechanical Turk. In addition, a comparison with SentiWordNet was drawn,but only at the level of performance test.As well several sentiment lexicons are compared by the quality of sentimentanalysis in English (Musto et al., 2014; Ozdemir, Bergler, 2015) and in Portuguese(Freitas, Vieira, 2013).Within the context of our study we should mention the work (Kiselev et al., 2015)in which the thorough analysis of 12 existing lexical-semantic resources (printed ex‑planatory dictionaries, dictionaries of synonyms, electronic thesauri) is performed.3.

Text corporaIn our work the reviews of restaurants, cars, movies, books and digital camerasare researched. The reviews of restaurants were collected from the site Restoclub5, thereviews of cars—from the site Cars@mail.ru6. For the rest domains the text corporaof seminar ROMIP2011 and 2012 are used (Chetviorkin et al., 2012; Chetviorkin, Lou‑kachevitch, 2013).The initial score scales (movies, books, restaurants—ten-point, cameras, cars—fivepoint) were converted to binary scale by the following schemes: for ten-pointscale—{1...4} → neg, {6...10} → pos; for five-point scale—{1...2} → neg, {4...5} → pos.As a training set the random chosen ten thousand reviews are used for each domain.For the ROMIP’s domains these reviews are chosen from train corpora of ROMIP2011,for remaining domains—from entire corpora.

Test sets for ROMIP’s domains are equalto the test corpora union of ROMIP2011 and 2012 for each domain separately. As testsets for restaurants and cars all reviews are used except for training reviews.The characteristics of training and test corpora are given in Table 1.5http://www.restoclub.ru6https://cars.mail.ru/reviewsManually Created Sentiment Lexicons: Research and DevelopmentTable 1.

Text corpora (Nav—an average number of words per review)Train corporaTest corporaDomainPosNegTotalNavRestaurantsCarsMoviesBooksCameras7,9827,9007,3307,8888,9212,0182,1002,6702,1121,07910,000 8710,000 10410,000 8010,000 3110,000 94PosNeg15,353 1,54438,148 1,2865941263563961254Total16,89739,434720395666Nav16271212235226Totalreviews26,89749,43410,72010,39510,666It should be noted that the corpora are highly imbalanced: the part of positivereviews is ranging from 73.3% for the movie training corpus to 96.7% for the car testcorpus.4. Sentiment lexicons creatingThe proposed procedure of sentiment lexicon creation consists of three mainstages: 1) word weighting and selection; 2) collaborative manual word annotation;3) consolidation of sentiment lexicons.At the first stage the morphological analysis of training corpus is performed(we used mystem7), then full dictionary of training corpus is formed and stop wordsare removed.

All the words are weighted using the supervised term weighting scheme,e.g., RF (Relevance Frequency), which demonstrated good performance in the textcategorization task (Lan et al., 2009). In this scheme the weight of a given word to thesentiment category S is calculated by formula: = log 2 �2 +�,max (1, )where a—a number of documents related to category S and containing this word,b—a number of documents not related to category S and containing this wordas well.For each word two weights arecalculated:the first weight RFpos towards S = positive − = S = negative.

Свежие статьи
Популярно сейчас