аннотация10 (Аннотации)

PDF-файл аннотация10 (Аннотации), который располагается в категории "разное" в предмете "английский язык" издесятого семестра. аннотация10 (Аннотации) - СтудИзба 2020-08-25 СтудИзба

Описание файла

Файл "аннотация10" внутри архива находится в следующих папках: Аннотации, 5. PDF-файл из архива "Аннотации", который расположен в категории "разное". Всё это находится в предмете "английский язык" из десятого семестра, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .

Просмотр PDF-файла онлайн

Текст из PDF

Koltsova O. Yu., Alexeeva S. V., Kolcov S. N.:An Opinion Word Lexicon and a Training Data set for Russian SentimentAnalysis of Social Media.Automatic assessment of sentiment in large text corpora is an important goalin social sciences. This paper describes a methodology and the results of systemdevelopment for Russian language sentiment analysis. It includes: a publiclyavailable sentiment lexicon, a publicly available test collection with sentimentmarkup and a crowdsourcing website for such markup. The lexicon is aimed atdetecting sentiment in user-generated content (blogs, social media) related to socialand political issues.

Its prototype was formed based on other dictionaries and on thetopic modeling performed on a large collection of blog posts. Topic modelingrevealed relevant (social and political) topics and as a result—relevant words for thelexicon prototype and relevant texts for the training collection. Each word wasassessed by at least three volunteers in the context of three different texts where theword occurred while the texts received their sentiment scores from the samevolunteers as well. Both texts and words were scored from −2 (negative) to +2(positive). Of 7,546 candidate words, 2,753 got non-neutral sentiment scores.The quality of the lexicon was assessed with SentiStrength software bycomparing human text scores with the scores obtained automatically based on thecreated lexicon. 93% of texts were classified correctly at the error level of ±1 class,which closely matches the result of SentiStrength initial application to the Englishlanguage tweets.

Negative classes were much larger and better predicted..

Свежие статьи
Популярно сейчас