Главная » Просмотр файлов » И.С. Гудилина, Л.Б. Саратовская, Л.Ф. Спиридонова - English Reader in Computer Science

И.С. Гудилина, Л.Б. Саратовская, Л.Ф. Спиридонова - English Reader in Computer Science (1114139), страница 8

Файл №1114139 И.С. Гудилина, Л.Б. Саратовская, Л.Ф. Спиридонова - English Reader in Computer Science (И.С. Гудилина, Л.Б. Саратовская, Л.Ф. Спиридонова - English Reader in Computer Science) 8 страницаИ.С. Гудилина, Л.Б. Саратовская, Л.Ф. Спиридонова - English Reader in Computer Science (1114139) страница 82019-05-05СтудИзба
Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Текст из файла (страница 8)

2) Divide it into introduction, principle part, and conclusion.

3) Find and write down the main idea(s) of the text.

Stage В

1) Read the text again but now attentively (close reading).

2) Give detailed answers to the questions.

3) Write the items of the plan.

Stage С

1) Write out kernel sentences to illustrate the items of the plan.

2) Join kernel sentences together; use connective words if necessary.

3) Re-read your summary and make sure that the sentences are presented in a logical order

Make any changes that you think are necessary.

III. Read the text. Agree or disagree with what is told in the text. Write about your attitude to the problem. Give arguments in favour of your point of view.

All modem systems ended up employing methods from both statistics and linguistics. Although fundamental differences remain, it is informative for all future machine translation systems to identify what parts of the systems tend toward linguistics, what parts toward statistics, and why this should be so.

If you want to build a non-toy machine translation system - a system with more than approximately 5,000 lexical items - that handles previously unseen input robustly, you always end up including some statistics-based or knowledge-based modules. It is not hard to see why this should be so. System must know correspondence of words in one language to another.

Simply building into the initial statistical model the idea of word classes is a big step away from pure language-independent statistics and a step toward symbolic/linguistic knowledge.

Scanning the range of MT applications, one can identify niches of optimum MT functionality, which provide clearly identifiable MT research and development goals. Major applications include

    1. assimilation tasks (such as scan translations of foreign documents and newspapers): lower-quality, broad domains — primarily statistical technology.

    2. dissemination tasks (such as translations of manuals and business letters): higher quality, limited domains — primarily symbolic technology.

    3. narrowband communication (such as e-mail translation): medium quality, medium domain—highly hybridized technology.

Toward the end of his position statement, Yorick Wilks points out a fact worth remembering: it's easy to build MT theories, but not easy to get results. In this regard, statistics-based systems are currently in a better position than symbolic ones because they emphasize evaluation to drive research. But in the long term, despite Wilks's somewhat pessimistic view, large enough knowledge bases will exist to make the symbolic and linguistic generalizations of central importance.

Unit 2

Statistical MT # stone soup

There is a considerable history of statistical/empirical approaches to machine translation, starting with Warren Weaver and the Georgetown system in the 1950s and 1960s. The Georgetown system eventually became known as Systran, and is still one of the more successful systems on the market. Statistical/empirical approaches lost favor when Chomsky and others pointed out some of their limitations in the late 1950s. It is difficult, for example, to capture long distance constraints such as subject-verb agreement with trigrams — sequences of three words. Increasing the window size to four or five words does little to address the fundamental issue. The constraint between the subject and the verb ought to be expressed in terms of subjects and verbs, and not in terms of words.

Despite these limitations, though, there has been a resurgence of interest in 1950s-style empirical and statistical methods in a variety of applications of natural language processing, including MT. The reasons for this resurgence are difficult to pin down. Some point to massive quantities of online text (corpus data), while others point to improvements in computer technology. In my more cynical moments, I wonder if the never-ending cycle from empiricism to rationalism and back again is just an artifact of human nature. Maybe it is inevitable that students revolt against their teachers. As Mark Twain put it, grandparents and grandchildren have a natural alliance; they have a common enemy.

Existing translations contain more solutions to more translation problems than any other existing resource. Peter Brown et al. are credited with reviving interest in statistical MT. Their work is based on Shannon's noisy-channel model. Imagine a noisy channel, such as a noisy telephone, or a speech recognition machine that almost hears. A sequence of good text (I) goes into the channel, and a sequence of corrupted text (0) comes out the other end.

I —> Noisy channel —> O

How can an automatic procedure recover the good input text, I, from the corrupted output, O? In principle, one can recover the most likely input, I, by hypothesizing all possible input texts, I, and selecting the input text with the highest score, Pr(I\O). Probability estimates are obtained by computing various statistics over a large sample of text such as a few years of the Associated Press newswire.

Translation doesn't exactly fit into the noisy channel model. Brown et al. assume that a French sentence, F, is just a noisy version of an English sentence, E. In this way, they view French-to-English translation as the task of recovering the "underlying" English sentence from the "observed" French sentence.

E—> Noisy channel—> F

Conceptually, their translation program searches the space of all possible English sentences for the sentence E that maximizes Pr(E\F). Their probability estimates are based on large samples of Canadian parliamentary debates, which are published in both English and French.

This approach is extremely controversial. On the surface, it would appear to be fundamentally flawed for reasons pointed out by Chomsky and others in the late 1950s. How can a (purely) statistical approach handle subject-verb agreement? Morphology? In many cases, Brown et al. have adopted solutions to these problems that look remarkably "linguistic," leading Yorick Wilks to charge that their approach is just stone soup. They talk a lot about the statistics, but we "know" that the linguistics is doing the bulk of flu-work.

There has been a lot of rhetoric on both sides. Who knows whether statistics are more important than linguistics or vice versa? I must say that I find the debate somewhat tiresome. Neither approach has made much progress; we are still a long ways from Yehoshua Bar-Hillel's ultimate goal: fully automatic high-quality translation (FAHQT). Perhaps the statistical/empirical approach is a step in the right direction, and perhaps not.

But either way, the statistical approach is producing a very interesting by-product: alignment programs that figure out which parts of a translation correspond to which parts of the original. These programs are being used in translation reuse. Many large jobs (such as manuals) are updated on a regular basis and don't change all that much from one version to another. Translation reuse tools make it easy to translate just the "diffs" rather than the entire job. There is a significant niche market for translation reuse. Reuse could easily be a bigger moneymaker than MT. At best, MT might be able to speed up a translator by a factor of two, whereas translation reuse can achieve much larger speedups if there aren't too many "diffs".

Alignment programs are also being used to produce just-in-time glossaries. Terminology is a major bottleneck for translators. How would Microsoft, or some other software vendor, want the term "dialog box" to be translated in their manuals? Technical terms such as "dialog box" are difficult for translators because they are generally not as familiar with the subject domain as either the author of the source text or the reader of the target text. In the past, translators had to read a lot of background material in both the source and target languages until they mastered the terminology in both languages, an extremely labor-intensive process. Parallel texts could be used to help translators overcome their lack of domain expertise by providing them with the ability to search previously translated documents for examples of potentially difficult terminology and see how they were translated in the past.

In this way, the statistical approach is producing a set of useful terminology and reuse tools. Unlike traditional MT, these tools do not attempt to compete with the human at what the human does best (translating the easy vocabulary and the easy grammar), but complement the human in areas where they know they need help (difficult vocabulary and reuse). In contrast with fully automatic MT and largely automatic approaches such as machine-assisted translation followed by post editing, Kay advocated the more modest goal of building tools that human translators would want to use.

It would be ironic if statistical MT ended up producing a toolbench that isn't statistical and isn't MT. But at least it isn't stone soup....

NOTES:

  1. Statistical MT # stone soup. — Машинный перевод по статистическому методу — это не каменная стена.

  2. It is difficult, for example, to capture long distance constraints such as subject-verb agreement with trigrams-sequences of three words. — Трудно уловить подчинение (слов) на большом расстоянии, например, согласование подлежащего-глагола с триграммами — последовательным рядом из трех слов.

  3. Despite these limitations, though, there has been a resurgence of interest in 1950s-style empirical and statistical methods in a variety of applications of natural language processing, including MT. — Несмотря на эти ограничения, тем не менее, в 1950х был всплеск интереса к разнообразному применению эмпирических и статических методов для обработки естественного языка, включая машинный перевод.

  4. Some point to massive quantities of online text ( corpus data ), while others ... — Одни указывают на большое количество диалогового текста ( большой объем данных ), в то время как другие ...

  5. ... alignment programs that figure out which parts of a translation correspond to which parts of the original. — ... регулирующие распознающие программы согласовывают части перевода с частями оригинала.

I. Answer the following questions:

  1. What is the problem of long-distance constraints such as subject-verb agreement with trigramsequences of three words?

  2. How can automatic procedure recover the text with the help of Shennon's noisy-channel model?

  3. Which is more important: statistics rather than linguistics or vice versa?

  4. Do you know where statistical methods can be used?

  5. What did translators have to do in the past when they wanted to translate difficult terminology?

  6. Can this method compete with the human and in what areas?

  7. What is the most remarkable feature of statistical methods?

II. Write a summary in English.

Stage A

1) Look through the text (scheming reading).

2) Divide it into introduction, principle part, conclusion.

3) Find and write down the main idea(s) of the text.

Stage В

1) Read the text again but now attentively (close reading).

2) Give detailed answers to the questions.

3) Write the items of the plan.

Stage С

1) Write out kernel sentences to illustrate the items of the plan.

2) Join kernel sentences together, use connective words if necessary.

3) Re-read your summary and make sure that the sentences are presented in a logical order, Make any changes that you think are necessary.

III. Read the text. Agree or disagree with what is told in the text. Write about your attitude to the problem. Give arguments in favour of your point of view.

The problem of machine translation has proved so complex that the quality of the result has not correlated significantly with the method chosen.

The most remarkable feature of the statistical methods in machine translation is that they are not at all specific to their subject matter. The major scientific (or methodological) trend in the field is experimenting with how well the statistic - oriented methods will advance the state of the art in machine translation without the need for massive, manual knowledge acquisition.

The major technological trend in the field is looking for the best ways to mix the statistical and rule-based methods.

The knowledge and linguistics-based methods will do well to regroup and concentrate on those tasks and situations in which statistical approaches fail to deliver.

Характеристики

Тип файла
Документ
Размер
781 Kb
Тип материала
Высшее учебное заведение

Список файлов книги

Свежие статьи
Популярно сейчас
А знаете ли Вы, что из года в год задания практически не меняются? Математика, преподаваемая в учебных заведениях, никак не менялась минимум 30 лет. Найдите нужный учебный материал на СтудИзбе!
Ответы на популярные вопросы
Да! Наши авторы собирают и выкладывают те работы, которые сдаются в Вашем учебном заведении ежегодно и уже проверены преподавателями.
Да! У нас любой человек может выложить любую учебную работу и зарабатывать на её продажах! Но каждый учебный материал публикуется только после тщательной проверки администрацией.
Вернём деньги! А если быть более точными, то автору даётся немного времени на исправление, а если не исправит или выйдет время, то вернём деньги в полном объёме!
Да! На равне с готовыми студенческими работами у нас продаются услуги. Цены на услуги видны сразу, то есть Вам нужно только указать параметры и сразу можно оплачивать.
Отзывы студентов
Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.
Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.
Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.
Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.
Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.
Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.
Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.
Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.
Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.
Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.
Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.
Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.
Популярные преподаватели
Добавляйте материалы
и зарабатывайте!
Продажи идут автоматически
6549
Авторов
на СтудИзбе
300
Средний доход
с одного платного файла
Обучение Подробнее