linis (Аннотации), страница 3
Описание файла
Файл "linis" внутри архива находится в следующих папках: Аннотации, 5. PDF-файл из архива "Аннотации", который расположен в категории "". Всё это находится в предмете "английский язык" из 10 семестр (2 семестр магистратуры), которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст 3 страницы из PDF
In our case the share of ±1 class matchescomprises 93.0% which is comparable to Thelwall’s results—96.9% (Thelwall et al,2010). Prediction of the negative classes is better than that of the positive ones(95% and 59% for ‘−1’ and ‘−2’ classes vs. 82% and 19% for ‘+1’ and ‘+2’ classes).As it can also be seen, moderate classes are predicted much better than extremeclasses, which are very small, while the dominant ‘0’ class yields 99.6% of ±1 classmatches.SA systems for Russian use different evaluation techniques.
The closest to ourcase was the ROMIP SA competition held on texts from political news and fromblogs containing customer opinions (Chetviorkin, Loukachevitch, 2013). As sentiment lexicons are domain sensitive, it would be unfair to directly test our lexiconon the texts of a different type and to compare it to the approaches that were developed specially for this type. It would be equally unfair to apply the ROMIP methodsto our collection.
We therefore performed an indirect comparison of the results,using the same methodology of quality evaluation as ROMIP. Its best participantsin a three-class blog classification task exceeded their baseline by 12–27% in termsof recall and by 5–29% in terms of precision. In news classification task the respective values were 23–28% and 43–49%. Having converted our data into threeclasses (positive, negative and neutral), we calculated our baseline, precision andrecall (see Table 3).Table 3. Three-class classification qualityRecall (macro)Our lexiconBaselineDifference0.430.330.10Precision (macro)0.440.180.26The quality of our lexicon is comparable to that of the ROMIP approaches usedin the blog classification task and is lower than the quality reached for news. It shouldbe noted that class distribution of the ROMIP news collection was much more balanced (Panicheva, 2013) than that of both its blog collection and of our sample.
Thishas made the task of exceeding the baseline more difficult in blog SA. In contrastto most ROMIP methods, our lexicon is publicly available and may be improved by theresearch community.An Opinion Word Lexicon and a Training Dataset for Russian Sentiment Analysis of Social Media5. Conclusion and future researchWe have presented a lexicon for sentiment analysis of political and social Russian-language blogs.
Its quality is comparable to the results obtained for Englishlanguage Twitter and for Russian-language blogs with customer opinions. We havealso described the results of words and texts annotation based on a crowdsourcingapproach. The lexicon and the annotated collection are publicly available at our website linis-crowd.org that allows further crowdsourcing of sentiment markup. This webresource is aimed at the widest research community. While the lexicon can be alreadyused by social scientists, the collection may serve as a benchmark for testing new sentiment instruments. In particular, we are now using it for training machine learningSA algorithms that should help increase the quality of SA.
We also plan to improve thelexicon by replicating our research on a collection of blog comments that are potentially much more emotional.6. AcknowledgementsThis work was supported by the Russian Foundation for Humanities, project ‘Development of a publicly available database and a crowdsourcing website for testingsentiment analysis instruments’, Grant No 14-04-1203.References1.2.3.4.5.Alexeeva S., Koltsova E., Koltcov S. (2015) Linis-crowd.org: A lexical resource forRussian sentiment analysis of social media [Linis-crowd.org: lexicheskij resursdl’a analiza tonal’nosti sotsial’no-politicheskix tekstov], Computational Linguistics and computantional ontologies: Proceedings of the XVIII joint Conference“Internet and modern society (IMS-2015)” [Kompyuternaya lingvistika i vyichislitelnyie ontologii: sbornik nauchnyih statey.
Trudyi XVIII ob’edinennoy konferentsii «Internet i sovremennoe obschestvo» (IMS-2015)], St. Peterburg, pp. 25–34.Bodrunova S., Koltsov S., Koltsova O., Nikolenko S., Shimorina A. (2013) IntervalSemi-Supervised LDA Classifying Needles in a Haystack, Proceeding of the 12thMexican International Conference on Artificial Intelligence (MICAI 2013) Part I:Advances in Artificial Intelligence and Its Applications, Berlin: Springer Verlag,pp. 265–24.Chetviorkin I. I., Braslavski P. I., Loukachevitch N. V. (2012), Sentiment Analysis Track at ROMIP 2011, Proceedings of International Conference Dialog,pp. 739–746.Chetviorkin I., Loukachevitch N. (2012) Extraction of Russian Sentiment Lexicon for Product Meta-Domain, Proceedings of COLING 2012: Technical Papers,pp. 593–610.Chetviorkin I., Loukachevitch N.
(2013) Sentiment Analysis Track at ROMIP2012, Proceedings of International Conference Dialog, Vol. 2, pp. 40–50.Koltsova O. Yu., Alexeeva S. V., Kolcov S. N.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.Esuli A., Sebastiani F. (2006) SentiWordNet: A publicly available lexical resourcefor opinion mining, Proceedings of 5th International Conference on LanguageResources and Evaluation (LREC), Genoa, pp. 417–422.Godbole N., Srinivasaiah M., Skiena S.
(2007) Large Scale Sentiment Analysis forNews and Blogs, ICWSM’2007, Boulder, Colorado, USA.Hong Y., Kwak H., Baek Y., Moon S. (2013) Tower of Babel: a crowdsourcing gamebuilding sentiment lexicons for resource-scarce languages, Proceedings of the22nd International World Wide Web Conference (WWW), pp. 549–556.Hsueh P., Melville P., Sindhwani V. (2009) Data quality from crowdsourcing:a study of annotation selection criteria, Proceedings of the NAACL HLT 2009Workshop on Active Learning for Natural Language Processing, Boulder, Colorado, pp. 27–35.Hu M., Liu B. (2004) Mining and summarizing customer reviews, Proceedingsof the ACM SIGKDD International Conference on Knowledge Discovery and DataMining (KDD-2004), Seattle, WA, pp. 168–177.Koltsova O., Koltcov S., Alexeeva S.
(2014) Do ordinary bloggers really differfrom blog celebrities? Proceedings of WebSci ‹14 ACM Web Science Conference,Bloomington, IN, USA, NY: ACM, pp. 166–170.Koltsova O., Shcherbak A. (2015) ‘LiveJournal Libra!’: The political blogosphereand voting preferences in Russia in 2011–2012, New Media and Society, vol.
17,no. 10, pp. 1715–1732.Ku L.-W., Liang Y.-T., Chen H.-H. (2006) Opinion Extraction, Summarization andTracking in News and Blog Corpora, Proceedings of the AAAI-CAAW›06.Loukachevitch N. V., Blinov P. D., Kotelnikov E. V., Rubtsova Y. V., Ivanov V. V., Tutubalina E. (2015) SentiRuEval: testing object-oriented sentiment analysis systems in Russian, Proceedings of International Conference Dialog, Vol.
2.Medhat W., Hassan A., Korashy H. (2014) Sentiment analysis algorithms and applications: a survey, Ain Shams Engineering Journal, Vol. 5, Issue 4, pp. 1093–1113.Mihalcea R., Banea C., Wiebe J. (2007) Learning multilingual subjective languagevia cross-lingual projections, Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 976–983.Mohammad S, Dorr B., Hirst G., Turney P. (2011) Measuring degrees of semanticopposition, Technical report, National Research Council Canada.Mohammad S. M., Turney, P. D.
(2013), Crowdsourcing a word-emotion association lexicon, Computational Intelligence, Vol. 29 no. 3, pp. 436–465.Morkovkin V. V. (2003) Explanatory dictionary of Russian language: structuralwords: prepositions, conjunctions, particles, interjections, parentheses, pronouns,numbers, connections [Ob’jasnitelnyj slovar’ russkogo jazyka: Structurnyje slova:predlogi, sojuzy, chastitsy, mezhdometija, vvodnyje slova, mestoimenija, chislitelnyje, svjazannyje slova], Astrel, Moscow.Nikolenko S., Koltcov S., Koltsova O. (2015) Topic Modeling for Qualitative Studies. Journal of Information Science (R&R).Pang B., Lee L. (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts, 42nd Meeting of the Associationfor Computational Linguistics[C] (ACL-04), pp.