linis (Аннотации), страница 2

PDF-файл linis (Аннотации), страница 2, который располагается в категории "разное" в предмете "английский язык" издесятого семестра. linis (Аннотации), страница 2 - СтудИзба 2020-08-25 СтудИзба

Описание файла

Файл "linis" внутри архива находится в следующих папках: Аннотации, 5. PDF-файл из архива "Аннотации", который расположен в категории "разное". Всё это находится в предмете "английский язык" из десятого семестра, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .

Просмотр PDF-файла онлайн

Текст 2 страницы из PDF

We address some of these problems further below.3. Data collection and markup3.1.Generating relevant text collectionWe extract our collection from our database that includes all posts by top 2,000LiveJournal bloggers for the period of one year (from March 2013 to March 2014).Earlier we found out that only about a third of those texts may be classified as political or social (Koltsova et al, 2014), hence, we face a problem of retrieving relevanttexts. While Hsueh et al (2009) employ manual annotation, this is unfeasible for ourcollection of around 1.5 million texts, so we adopt a different approach (Koltsova,Shcherbak, 2015).

We perform topic modeling, namely Latent Dirichlet Allocationwith Gibbs sampling (Steyvers, Griffiths, 2004). It yields results akin to fuzzy clustering, by ascribing each text to each topic out of a predefined number, with a varyingprobability, based on word co-occurrence. All words are also ascribed to all topicswith varying probabilities. When sorted by this probability they form lists that allowAn Opinion Word Lexicon and a Training Dataset for Russian Sentiment Analysis of Social Mediafast topic interpretation and labeling by humans.

Our TopicMiner software ( was used for all topic modeling procedures.Our prior experience shows that the optimal number of topics depends most of allon the “size” of topics to be detected (smaller topics demand a larger number). A seriesof experiments (Bodrunova et al, 2013; Nikolenko et al, 2015) has lead us to choosethe number of 300 for the task of retrieving social and political topics. 100 most relevant texts and 200 words of each topic were read by three annotators who have identified 104 social or political topics.

The topic was considered relevant if two of the threeannotators had chosen it. Intercoder agreement, as expressed by Krippendorf’s alpha,is 0.578. Texts with the probability higher than 0.1 in these 104 topics (mean probability = 0.3) were considered relevant and were included into the final working collection which comprised 70,710 posts.3.2.Selection of potentially sentiment-bearing wordsBased on the aforementioned literature, we employed a complex approach to thegeneration of a proto-lexicon for manual annotation (all details on this approach canbe found in Alexeeva et al, 2015) comprising the following elements:• the list of high-frequency adjectives created by the Digital Society Lab ( based on a large collection of Russian language texts from socialmedia, and the list of adverbs automatically derived from the former list;• Chetviorkin-Loukashevitch lexicon (Chetviorkin, Loukachevitch, 2012) (mostof it later discarded);• Explanatory Dictionary of the Russian Language (Morkovkin, 2003);• Translation of the free English-language lexicon accompanying SentiStrengthsoftware (Thelwall et al, 2010);• 200 most probable words for each of the relevant topics identified by annotators,which was aimed at detecting domain-specific words.We formed a lexicon of potentially sentiment-bearing words accepting only thosethat occurred in at least two of the listed sources.

The lexicon comprised 9,539 units.However, only 7,546 of them occurred in the texts identified as social or political, andonly they were later manually annotated.3.3.Data markup and evaluation of crowdsourcing resultsTo avoid some pitfalls of crowdsourcing we have adopted, so to say, a sociologicalvision of it: our volunteers were not supposed to imitate experts; rather, their contribution was seen as similar to that of respondents in an opinion poll which cannot produce “wrong” answers. For that, we tried to make our sample of assessors as diverseas possible in terms of region, gender, and education.

In total, 87 people from 16 citiestook part in the assessment.Koltsova O. Yu., Alexeeva S. V., Kolcov S. N.They worked with our website and assessed words’ sentimentas expressed in the texts in which they occurred, as well as the prevailing sentimentof the texts themselves, with a five-point scale, from -2 (strong negative), to +2 (strongpositive).

The texts were to help detect domain-specific word polarity.Each word was shown with three different texts, one at a time. Each post wascut down to one paragraph since long texts are more likely to include different sentiments. Once each word received three annotations, we went on with further annotation to get more than one assessment for each text. By the time of data analysis, we received 32,437 word annotations and the same number of text annotations (of them,14 word annotations and 18 text annotations were discarded due to technical errors).In total, the assessors annotated 19,831 texts. Annotated word and text collections areavailable at agreement in word assessment task (five-class), as expressed by Krippendorf’s α, has turned out to be 0.541.

To compare, Hong et al (2013) report α as lowas 0.11–0.19 for a three-class word sentiment annotation task. Taboada et al (2011: 289)obtain mean pairwise agreement (MPA) of 67.7% in a three-class task of word assessmentin customer reviews (NB: α and MPA are not directly comparable). In text annotationtask we obtained α=0.278 for all texts and 0.312 for the texts that got non-zero scores(five-class). Hsueh et al (2009) report MPA among Amazon Mechanical Turk annotatorsto be 35.3% for a four-class task of political blog posts annotation. Ku et al (2006) claimto reach a much higher agreement of 64.7% for four-class blog annotation and 41.2%for news annotation by specially selected assessors.

Nevertheless, none of these levelsis impressively high. In relation to this, Hsueh and colleagues (2009) point at the problem of political blogs’ ambiguity. We tend to agree that this ambiguity and general lackof societal consensus on the polarity of political issues, not (or at least not only) the lackof quality, cause the low agreement. Therefore, disagreeing individuals cannot be filtered out because they may reflect an important part of the public opinion spectrum.A milder measure of divergence of an annotator’s mean score from the global mean allows for a lot of disagreement on individual items. It shows that in our case only 0.5%of all annotations were made by individuals strongly deviating from the global mean.4.

Results4.1.Word and text assessment resultsThe majority of words (4,753) were annotated as neutral and therefore excludedfrom the lexicon. Table 1 shows that although negatively assessed words prevail, positive words have been also detected. At the same time, highly emotional words arequite a few.An Opinion Word Lexicon and a Training Dataset for Russian Sentiment Analysis of Social MediaTable 1. Distribution of mean scores over wordsMean score (rounded)Number of words withsuch score−2−1012Share of words withsuch score, %2251,6664,7538534932263110.6We have also calculated the variance of scores for each word.

Although, as mentioned above, disagreement not necessarily indicates low quality, the usefulnessof highly disputable words for sentiment classification of texts is doubtful. Since distance between two neighboring values of scores is one, we have regarded all wordswith variance =>1 as candidates for being discarded. However, we have found only153 such words, and most of them looked like sentiment-bearing.

In some casestheir polarity seemed quite obvious: e.g. gorgeous (сногсшибательный), filth (мерзость), long-suffering (многострадальный), first-class (первосортный), while others looked ambiguous: e.g. endless (бесконечный), quality (качество), and tolerance(терпимость). At this stage of the research they have been included in the lexiconwith their mean scores (since their number is anyway negligibly small).The distribution of scores over texts is similar to that of words (see Table 2).

Mosttexts were marked as neutral. Positive class size is obviously insufficient (it relatesto the negative class as 1:4.6). The same unbalanced class structure in political blogsis also pointed at by Hsueh et al (2009).Table 2. Distribution of mean scores over textsMean score (rounded)Number of texts withsuch score−2−1012756,54611,7601,42723Share of texts with suchscore0.434617234.2.Lexicon quality evaluationAfter neutral word filtering and leaving out the words that did not occur in therelevant texts, our lexicon comprised 2,753 items. We installed this lexicon into SentiStrength freeware for quality evaluation (Thelwall et al, 2010).

All texts were lemmatized with MyStem2 (Segalovich, 2003) prior to SA. In the default mode, SentiStrengthascribes two sentiment scores to each text: one corresponds to the maximal negativeKoltsova O. Yu., Alexeeva S. V., Kolcov S. N.word score in the text, and the other—to the maximal positive score.

If booster wordsoccur before a given word, the absolute value of its score is increased. If negationis used, the sign of the score is reversed. The integrated text score was then calculatedas the mean of the two SentiStrength scores.We defined lexicon quality as the share of correctly detected cases. We firstcalculated the absolute difference between the rounded mean assessors’ scoreof a given text and the rounded integrated score based on SentiStrength results.Then we obtained the share of the exact matches, as well as the share of ±1 classmatches, as offered by SentiStrength developer (Thelwall et al, 2010). A ±1 classmatch means that if a text is ascribed to one of the two neighboring classes, theclass is considered correctly predicted.

Свежие статьи
Популярно сейчас