Summary (1137510), страница 4
Текст из файла (страница 4)
The data that we use in the experiment was described in ChapterII.The training dataset included 394 definitions with previously matched hypernyms.Annotated dataset consists of 114 hyponym-hypernym pairs marked up by two annotators.The pairs were selected to have hypernyms of different frequency. We employed two metricsfor inter-annotator agreement.
Weak agreement which is defined by Fleiss kappa metric as κ= 0.57, and strong agreement as κ = 0.36. The section grounds the choice of κ metrics anddescribes criteria for strong and weak agreement.The goal of WSD task setup is to permit comparison of supervised andsemi-supervised machine learning methods under a hypothesis that presence of similar WSDproblems helps to find reliable answers with the use of unmarked data. The section isconcluded by results discussion.
The best disambiguation algorithm generated 0.7 precisionvalue. This value is not sufficient to fully automate hyponym-hypernym chains. However fewtransformations are needed to eliminate possible errors. Thus, the method may be used as ahelpful tool that considerably reduces expert’s manual work in building an electronicthesaurus. The chapter also gives guidelines to further algorithm improvement.***The «Conclusion» chapter lists the main outcomes of the thesis among which are:● the thesis proposes a new approach to semantic relation extraction from a corpus ofdictionary definitions;● the thesis explores various disambiguation methods that are relevant for its main tasks;● the thesis gives a description of a method that generates word meaning chains linked toeach other with thesaurus relations;● the thesis presents a corpus of such chains developed on the basis of BRED by S.A.Kuznetsov and suggests their thorough multifaceted analysis;The thesis also puts forward a hypothesis that the methods proposed form a sufficientbasis for designing thesauri in low-resource languages and minimize the contribution comingfrom experts.
Checking this hypothesis is the subject of further research.The following papers discuss the main results of the thesis:● Alexeyevsky, Daniil Andreevich. “BioNLP ontology extraction from a restricted languagecorpus with context-free grammars” // Informatics and its Applications, vol. 10 issue 2,pp. 119128, 2016, Moscow, Russian Academy of Sciences, Branch of Informatics,Computer Equipment and Automatization.● Alexeyevsky, Daniil, and Anastasiya V. Temchenko.
“WSD in Monolingual Dictionariesfor Russian WordNet.” // In Proceedings of the Eighth Global WordNet Conference,1015. Bucharest, Romania, 2016.● Alexeyevsky, Daniil. “Semi-Supervised Relation Extraction from Monolingual Dictionaryfor Russian WordNet.” // In Proceedings of CICLing17 Conference.
LNCS, 2018. (inprint)● Alexeyevsky, Daniil. “Word sense disambiguation features for taxonomy extraction” //Computacion y Sistemas, vol. 22 issue 3. 2018.