ulanovav (1185425), страница 2
Текст из файла (страница 2)
The ratios of these counts ()representthedistribu →and the count of that wordtion of lexical translation probabilities. This operation is performed in both transla → . tion directions, i. e. → and → .. → . are collected for a given opinion word list in the sourceOpinion wordtranslations() =() () =()∃,∃,::()= maxmax() ()= maxmax() as follows:language. Correct translation of a sourceopinion word is determined∃, : () = max () ()= max= ()∃, :max () () = ()In other words, to make translation∃,of :a sourcewe chooseword with ()word,= max () a () to the same word witha maximum translation probability and check that it translatesa maximum probability as well. The translated words are normalized.= max (Context-dependent opinion lexicon translation with the use of a parallel corpus3. Experiments3.1.Opinion lexicon projectionWe conducted several experiments to validate the proposed approach. Twoparallel datasets are used in our experiments.
The first one consists of Russian andEnglish reviews collected from the Mobile Review site5. We downloaded all pagesfrom the English editorial of the site. Then we downloaded Russian versions of thesepages using English links without the token “-en”. We will refer to this datasetas to “MR”.The second one consists of the first 5,000 lines from the reviews of books, cameras and films taken from ROMIP 2011 sentiment analysis dataset [5] and 1,000 linesof iPhone4 reviews from Yandex Market6 along with their Russian translation produced by Google Translate.
We will refer to it as to “ROMIP-GT”. The datasets aresplit into sentences with Freeling7 and aligned with Microsoft Bilingual SentenceAligner [18]. After the above mentioned, the aligned “MR” contains 579,559 Russianand 726,798 English words, the aligned “ROMIP-GT” contains 714,533 Russian and820,241 English words. We use GIZA++ [15] for creating word lexical translation tables.
English opinion word lists are downloaded from Bing Liu’s homepage8. There are4,818 negative and 2041 positive words. We will refer to this list as to “BL” dictionary.Mystem9 is used to normalize the Russian words. They are transformed to singular,masculine, nominative, present time forms.We produce 4 opinion lexicons in Russian in total. During lexicons construction we remove all words containing spaces and minuses, and which are shorter than3 symbols. “BL-GT” lexicon contains translated and normalized opinion words from“BL”. “BL‑GT filtered” lexicon was constructed in the following way. Words from “BL”were translated to Russian and then back to English using Google Translate. We collected only those Russian translations that produced English translation equal to itsEnglish original.“MR” lexicon is created by application of our method to “MR” parallel corpora.“ROMIP-GT” lexicon is created using our method with the “ROMIP-GT” dataset.“ROMIP-GT merged” lexicon is produced in the following way.
We applied our methodto 3 subsets of “ROMIP-GT”, i.e. books, films and cameras. Then the resulting listswere merged. The number of opinion words in each lexicon is listed in Table 1. Table2 shows intersections of the lexicons.5http://mobile-review.com/6http://market.yandex.ru/7http://nlp.lsi.upc.edu/freeling/8http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html9http://company.yandex.ru/technologies/mystem/Ulanov A. V., Sapozhnikov G.
A.Table 1. Opinion words numberLexiconPositiveBL (English)BL-GTBL-GT filteredMRROMIP-GTROMIP-GT mergedUnion:Negative2,0411,4439071637061,0571,9934,8183,0672,0371821,3111,8124,040Total6,8594,5102,9443452,0172,8696,033The lexicon “BL-GT” is the biggest with almost 4.5 thousand words. However, it is lessthan the original list by 34%. This is due to the fact that some words were translated to thesame surface form (27%), due to phrases removal (they contain spaces) and due to normalization. There is a small portion of untranslated words as well. “BL-GT filtered” is almost a half of the original dictionary.
It is interesting to see, however, that so many wordsare translated from English to Russian and back to English with the original form.“MR” lexicon that was produced from the Mobile Review parallel corpus is rathersmall. This is because it contains a different English lexicon than the opinion word list“BL”. The “MR” texts were written by a limited number of persons, while the opinionlexicon “BL” contains contributions from a lot of people.Interestingly, “ROMIP-GT merged” is 30% bigger than “ROMIP-GT” and is almost as big as “BL-GT filtered”. Table 2 suggests that “ROMIP-GT merged” has 1222or 45% of words in common with “BL-GT filtered”. This is because the words in thelatter case were translated in isolation while in the first case they were translatedwithin the context.We can get as many as 6,033 opinion words if we merge all lists, which is 89%of the original English list.Table 2.
Opinion words intersectionWordsIntersectionMRMRROMIP-GT mergedROMIP-GT mergedROMIP-GT mergedBL-GTBL-GTBL-GT filteredpos118132626436neg881781,006786total2063101,6321,222We made a manual assessment of the lexicons. Table 3 shows their precision.“BL-GT filtered” is the most accurate.
This can be explained by the fact that it containsjust the right English words translated unambiguously without context. Also, we compared “MR” and “ROMIP-GT” lists. The first was derived from professional reviews,the second from user reviews. It is interesting to note that “MR” contains “specific”opinion words and “ROMIP” contains emotional words.Context-dependent opinion lexicon translation with the use of a parallel corpusTable 3. Precision by manual assessmentLexiconPrecisionBL-GTBL-GT filteredMRROMIP-GTROMIP-GT merged0,790,870,760,830,823.2.Document Sentiment ClassificationThe number of words in the list doesn’t mean its quality.
We conducted severalexperiments to benchmark the produced opinion word lists. We decided not to checkthe words manually, but to use them in the real-world task, that is sentiment classification. The experiments are performed on the annotated part of ROMIP 2011 dataset [5]. It contains reviews of books, films and cameras. There are 750 positive and124 negative review instances.Counting the number of positive and negative words is the most straightforwardway to text sentiment classification [13]. The one with the greater number of opinion words wins. The work [17] suggests that it is better to consider the presenceof an opinion word in text rather than the number of appearances.
We implementboth approaches. We will refer to the first as to “Frequency voc” and to the secondas to “Binary voc”.Supervised approaches to text sentiment classification were studied by Pang et al.[17]. We use a linear perceptron classifier with two types of feature computation: termfrequencies and delta TF-IDF. The latter was proposed by Martineau et al. [11] andproven to be efficient for sentiment classification in Russian [16]. The experiment results of these methods were obtained after performing 10-fold cross validation. Theseresults act as a base line of supervised classification that requires an annotated dataset.
We compare them with dictionary-based classification that does not require classlabels to train, because it has negative and positive words. Therefore, results of supervised classification are considered as a higher bound for a dictionary based.Table 4. Experiment resultsLexiconMethodRomip-GTPerceptronPerceptron + TfIdfBinary VocFrequency VocBinary VocFrequency VocRomip-GTmergedMicroPMicroR (Acc)MacroRMacroF10.840.840.760.790.840.860.840.840.680.720.800.820.590.620.590.590.590.590.600.630.580.590.610.61Ulanov A. V., Sapozhnikov G. A.LexiconBL-GTBL-GTfilteredMRMethodBinary VocFrequency VocBinary VocFrequency VocBinary VocFrequency VocMicroPMicroR (Acc)MacroRMacroF10.650.730.780.770.670.660.600.690.780.720.520.530.620.590.590.580.500.510.540.560.580.580.490.50The experiment results are represented in Table 4.
The binary approach providesthe same weight to all of the words. Low performance of the binary approach as compared with the frequency approach means that the lexicon is of low quality. It maycontain common words that can be found in the text (that rarely speak about subjectivity). So we can say that “BL-GT” is rather dirty. “ROMIP-GT merged” gives the bestperformance among the opinion lexicons. It has the same number of words as “BLGT filtered”, but the performance of the “ROMIP-GT merged” is higher, so we can saythat its quality for sentiment classification is better. It is because the words in “ROMIPGT merged” were translated with the use of context unlike the words in “BL-GT filtered”.
“BL-GT filtered” shows better results in manual assessment, but worse resultsin classification. We can explain this by the fact that “ROMIP-GT merged” containssuch words that out of context may seem not opinion words or words that are moreoften used in user reviews as compared with words from “BL-GT filtered”.We supposed that the increase in the classification performance could be due to thefact that we used a part of the big dataset ROMIP 2011 to retrieve “ROMIP-GT merged”,and the labeled dataset that was used for classification was also a part of ROMIP 2011.However, it turned out that the intersection between these parts did not exceed 1%,and it couldn’t lead to the significant increase of the classification performance.We use our lexicons as a list for feature selection as in [6], and train a linear perceptron classifier.