Summary (1137501), страница 2

Файл №1137501 Summary (Автоматизация лексико-типологических исследований методы и инструменты) 2 страницаSummary (1137501) страница 22019-05-202019-05-20СтудИзба

Автоматизация лексико-типологических исследований методы и инструменты

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 2)

Chapters 3 – 5 introduce the methods that can be used to automate each stage of theanalysis: designing the questionnaire (Chapter 3), filling the questionnaire with data frommultiple languages (Chapter 4), and generating the semantic map (Chapter 5). The Conclusionsums up the major findings of the thesis and suggests the directions for further research.SUMMARY OF THE THESISChapter 1 “Introduction” contains a brief overview of the existing methods of lexicaltypological analysis.

The fundamental premise of lexical typology states that lexical units inlanguages form a system, and that lexical systems can be compared across languages. Theexisting approaches in this area of linguistics primarily differ in their selection of parameters forcomparison of lexical units. These parameters further guide the choice of data sources and howthe data is processed and analyzed.4Five major lines of research can be defined in contemporary lexical typology:(1) The experimental approach goes back to Berlin, Kay 1969 and their research on thetypology of colour terms. In this methodology, questionnaires consist ofextralinguistic stimuli, i.e. the comparison is carried out on the basis of perceptualcharacteristics of objects or situations.(2) The theory of semantic primitives developed by A.

Wierzbicka and C. Goddard(Wierzbicka 1985) stipulates that the meaning of any word in a natural humanlanguage is composed of a very limited set of universal semantic primitives (such as“I”, “you”, “something”, “big”, “think”, etc.). Different combinations of universalprimitives differentiate the meanings of individual words from each other.(3) Dictionary-based approaches examine the submeanings listed in dictionary entries andattempt to identify patterns of their “colexification” (in the terminology of François2008), i.e.

their convergence within lexemes.(4) The approaches that rely on parallel corpora (e.g. see. Viberg 2006, Wälchli, Cysouw2012) use the contexts that are found across multiple languages as a proxy for alexical typological questionnaire; words are differentiated by the contexts in whichthey can or cannot occur.(5) The frame-based approach to lexical typology posits the existence of a universal set ofthe minimal lexical meanings (frames).

It is considered that each semantic fieldfeatures a peculiar set of frames, while different words cover various combinations offrames.The first section of Chapter 1 looks at the first four of the above-mentioned approaches,discusses their weaknesses and strengths, and points out the general trend towardscomputerization of data collection and analysis.The second section of Chapter 1 provides an in-depth discussion of the fifth approach thatforms the groundwork of this research. The approach emerged from the traditions of theMoscow semantic school (see Апресян 1974); it suggests that comparative analysis ofintralinguistic and crosslinguistic quasi-synonyms should be carried out in terms of theircombinatorial properties. To compare words from different languages, the researcher shouldsplit their semantics into non-overlapping conceptual fragments, i.e.

the types of situations inwhich these words are used. The types of situations correspond to different groups of contexts.For instance, the semantics of the Russian adjective tonkij (‘thin’) can be represented asthe following set of conceptual fragments: ‘with a small diameter of the cross-sectional profile’; denotes the property of elongatedobjects (‘pencil’, ‘rope’, ‘stick’, etc.); ‘with a small distance between the surfaces of an object’ – about dimensions of flatobjects (“layers”), such as a book, fabric or paper; ‘low in intensity and high in pitch’ – about the characteristics of sound; is used in thecontext of the nouns zvuk ‘sound’, golos ‘voice’, etc.This set of elementary situations can also be used to identify the equivalents of theRussian adjective tonkij in translations.

Cf., for example, the Chinese translations of this word: ‘thin’ + the name of an elongated object => xì (xì gùnzi – ‘a thin stick’);5 ‘thin’ + the name of a flat object => báo (báo zhǐ – ‘thin paper’), etc. (see Кюсева et al.2013 for more detail).Such situations are called frames. Frames are regarded as the minimal lexical meanings;i.e. individual lexemes cover various combinations of frames. Combinations of frames are notequally probable; some of the frames tend to converge under one lexeme while others have astronger tendency to diverge into different lexemes. The patterns of frame convergence within alexeme are visualised in lexical semantic maps.This method of lexical semantic research has been applied to a wide range of lexical data,cf.

Майсак, Рахилина 2007, Круглякова 2010, Кашкин 2013, Холкина 2014, Кюсева 2012.It proved to be efficient in establishing the frame structure of individual semantic fields and incomparing words across languages.Research of lexical units in this paradigm consists of the following steps:1. Compile the questionnaire (i.e. define the tentative set of frames) by analyzingthe combinability of the lexemes of the given lexical field in 3-5 languages.2.

Finalise the set of frames: collect data from the other languages of the sample.3. Describe and visualise the system in each language: draw the semantic map.4. Analyze the types of systems observed in different languages.To define the set of relevant frames, it is necessary to conduct a detailed analysis of datafrom dictionaries and corpora, and then extend it with data from interviews with speakers of thelanguage. As the key objective of the research is to learn the combinability rules for eachlexeme of the field, the questionnaires for the interviews are comprised of the contexts wherethe lexemes occur.

This means that the finalised questionnaire has to be translated into eachlanguage of the sample.Today, practically each of these steps is executed manually; this process is rather slowand requires meticulous and concerted effort from specialists in every language of the sample.The high cost and the need to recruit an expert for each subsequent language hinder large-scaleinvestigations of representative language samples. As a consequence, the insufficient sizes ofthe investigated language samples cast doubt on the validity of the reported findings,particularly, on the adequacy of linguistic grounds for distinguishing frames as specificsemantic units that claim the status of the minimal lexical meanings.Chapter 2 “Verifying the concept of frame with distributional semantic models” describesa series of experiments intended to provide additional evidence in support of frames.

The framestructure of a field is defined in terms of semantic similarity: the situations that belong to oneframe display the closest semantic proximity, while the distances between situations fromdifferent frames can vary. The distances between the frames are represented in the semanticmap of the field.Semantic distances between frames are measured on the basis of typological data.Normally, the frame-based approach considers only relative distances, i.e. if lexeme L1 cancover frames F1 and F2, and lexeme L2 can cover frames F2 and F3, and there is no lexeme thatcould cover F1 and F3 without covering F2, it is argued that frames F1 and F3 are farther apartthan F1 and F2 or F2 and F3. This configuration of frames is illustrated by the linear semanticmap: F1 – F2 – F3.6We extended the pilot study by Кюсева 2014 and developed a formula for a more precisequantitative measurement of typological similarity between frames; the formula is based on thefrequency of colexification of the minimal meanings.

Each frame is expressed as vector wwhose dimensions are represented by the lexemes that belong to the field in all the languages ofthe sample. If lexeme li can denote the frame, the value of the corresponding dimension is 1,otherwise 0. The typological distance between frames is computed as cosine similarity betweenthe respective vectors (cf. a similar metric of similarity discussed in the recent study by Youn etal. 2016).However, there exist other measures of semantic distances between lexical meanings. Oneof them (known as distributional semantic models, see Baroni et al.

2013) represents themeaning of a lexical unit (a word or a phrase) as a vector of its co-occurrences. Such semanticrepresentations are used for a wide range of purposes, some of which are quite similar to ourtask (e.g. in word sense disambiguation and in matching contexts with the most appropriate of anumber of quasi-synonyms). To the best of our knowledge, distributional semantic methods sofar have not been implemented in typology. However, it is quite reasonable to assume that if itis true that frames are equivalent to the minimal lexical meanings, then the distributionaldistances between them can be expected to correspond to the typological distances.We conducted several experiments with the fields of qualitative features ‘sharp’ and‘smooth’; the sets of frames and the data for computation of typological distances were drawnfrom the Typologically Oriented Database of Qualitative Features (see Кюсева et al. 2013а).For each frame we selected several illustrations (or ‘micro-frames’; e.g.

the frame‘instrument with a sharp functional end-point’ was illustrated by ‘sharp needle’, ‘sharp arrow’and ‘sharp spear’ from the field of ‘sharp’. Each micro-frame was aligned with its two-wordRussian counterpart (ostraja igla, ostraja strela, ostroe kopje); for each of these wordcombinations the co-occurrences vector was obtained.The distributional models in our experiments were developed with the parametersdescribed below.1. Dimensions: 10,000 most frequent content words lemmas (in the main subcorpus ofthe Russian National Corpus (RNC))2.

Values of the dimensions: the frequency of content words in the window of ±5 aroundthe lexical unit for which the vector is retrieved3. Distances between the vectors are computed as cosine similarity measure4. Training corpus: different combinations of the main subcorpus of the RNC (approx.220m tokens), the newspaper subcorpus of the RNC (approx. 200m tokens), and theruWaC corpus of texts collected from the Internet (around 1bn tokens)5.

Характеристики

Тип файла

PDF-файл

Размер

431,83 Kb

Материал

Автоматизация лексико-типологических исследований методы и инструменты

Тип материала

Кандидатская диссертация

Предмет

Филология

Высшее учебное заведение

НИУ ВШЭ

Список файлов диссертации

avtomatizacija-leksiko-tipologicheskih-issledovanij-metody-i-instrumenty.rar

Автоматизация лексико-типологических исследований методы и инструменты

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.