И.С. Гудилина, Л.Б. Саратовская, Л.Ф. Спиридонова - English Reader in Computer Science (1114139), страница 7
Текст из файла (страница 7)
II. Answer the questions:
-
What can you say about postdoctoral programs?
-
What is the central concern of the MCSM ?
-
What are the goals of the workshop?
Unit 8
I. Read and translate the text orally at sight.
This project is challenging from a technical perspective because information technology is moving so fast that there's seldom time for even de facto standards to emerge. Instead, we must deal with de facto interoperation — making incompatible products already in the marketplace communicate. Our philosophy is simple: Protocols, formats, and the like should not hinder business.
The success of this process clearly depends on market leaders in each area participating actively on their respective task forces. Admittedly, in past battles for market dominance (such as in operating systems and desktop PCs), it was difficult to ing leading players to the table. For robust Internet commerce, however, interoperability is so fundamental that we have to turn the concept of openness on its head - it's not Just publishing an API. Everyone's software has to work together because no angle company can control what platform its customers will use.
OVERVIEW
As proposed, Eco System will consist of an extensible object-oriented framework class libraries, application programming interfaces, and shared services) from which developers can assemble applications quickly from existing components. These applications could subsequently be reused in other applications.
We are also developing a Common Business Language (CBL) that lets application agents communicate using messages and objects that model communications in the real business world. A network services architecture (protocols, APIs, and data formats) will insulate application agents from each other and from platform dependencies, while facilitating their interoperation.
NOTES:
-
to challenge (v.) — 1) следовать из, вытекать 2) вызывать.
-
to emerge (v.) — проявляться, возникать.
-
to hinder (v.) — мешать, препятствовать, служить помехой,
-
robust (adj.) — надежный, живучий.
II. Make up 5 problem questions to cover the information.
CHAPTER 3
In this chapter you will gain practice in summarizing and annotating the original professional texts. Before starting to work on this chapter read the recommendations and advice on pp.5-8 very attentively.
Unit 1
Statistical Versus Knowledge-based Machine Translation
The problem of translating between languages is both ancient, illustrated by the Tower of Babel in biblical times, and widespread, with over 1,000 different languages in use today. Researchers have been working on machine translation of languages for almost 50 years. While there have been some successes and a few commercial systems, high-quality, fully automatic machine translation remains an elusive goal. Not surprisingly, there is some disagreement about how best to proceed. On one side, researchers working on knowledge-based approaches argue that to obtain high-quality translation requires considerable linguistic knowledge and large knowledge bases. On the other side, researchers working on statistical approaches argue that it is impractical to build large enough knowledge bases to make this feasible, but large corpora of translated text do exist that can be used to train a statistics-based system. In the middle are the hybrid approaches that attempt to combine the strengths of these approaches.
Machine translation: a hybrid view.
After only 35 years of effective machine translation R&D, I feel about its condition somewhat the way Мао Tse-Tung is said to have felt about the significance of the French Revolution after nearly 200 years: it's too early to tell. The broad facts are apparent to anyone who reads the newspapers, and are therefore a potentially inconsistent set: MT works, in the sense that everyday MT systems at the Federal Translation Division in Dayton, Ohio, and at the European Commission in Luxembourg produce fully automatic translations that many people use with apparent benefit. Moreover, more than 6,000 MT systems have been sold in Japan alone. But, the failure of intellectual breakthroughs to produce indisputably high-quality, fully automatic MT is also apparent, which has led some to say it is impossible, a claim inconsistent with the first observations.
These simple statements could have been made 10 years ago. What has changed since then is twofold: first, the irruption into MT of a range of techniques from speech research, pioneered by IBM Laboratories, that claimed the way out of the deadlock was empirical, in particular statistical, methods that took as data very large text corpora.
With these techniques, ШМ argued that high-quality MT would be possible without recourse to linguistics, artificial intelligence, or even foreign language speakers. It was not a new claim, for King had made it in the fifties, but IBM reapplied speech algorithms (in particular, hidden Markov models) to execute the program.
The second response, one championed at the time by Martin Kay, was to argue that no theory, linguistic or otherwise, would deliver MT in the foreseeable future. So, the escape from the very same deadlock was to move to machine-assisted MT, which then spawned a score of systems, many now available, that would help users create a translation but that involved, or required, no large claims about automatic MT.
Both developments agreed that linguistic theory was not going to deliver a solution, nor was artificial intelligence. AI had argued since the mid-seventies that knowledge-based systems were the key to MT, as to everything else. They had failed, however, to deliver knowledge bases of sufficient size, and had left us with only plausible examples, as in "The soldiers fired at the women and I saw several fall," where we understand that the "several" is the women — not because of any linguistic selection rules or statistical regularities, but because of our knowledge of how the world works. But the knowledge banks did not appear. Doug Lenat with the CYC project at the Microelectronics and Computer Technology Corp. (MCC) is building a large formal-knowledge base, as is Sergei Nirenburg at New Mexico State University (NMSU), with an ontology of conceptual facts. However, these have not yet been brought info contact with large-scale problems, which -was why some people took the statistical claims seriously.
Linguistics was in a far worse position than AI to weather the statistical onslaught'. Noam Chomsky's only argument against early statistical claims was that "I saw a triangular whale" was enormously improbable, as a sentence, but nevertheless well formed. For some reason no one can now remember, arguments of that quality succeeded in repressing empirical methods for 30 years, which explains in part why IBM's pioneering claims were a little like the fall of an intellectual Berlin Wall.
AI researchers who were hostile to linguistics, myself included, perhaps should have been more positive about the IBM claims when they emerged: some of us had espoused symbolic theories of language that rested on quantifiable notions of the coherence or preference of linguistic items for each other. So, perhaps the statistical view was simply offering a data-gathering method for what we had claimed all along?
But IBM, and its imitators, did better than many expected. Its researchers could produce 50-plus percent of correctly translated sentences from unseen sentences in a trained corpus. To many onlookers that was a striking achievement. But they could not regularly beat Systran, the oldest and tiredest piece of MT software, the one that produces the daily translations at Dayton and Luxembourg.
The IBM researchers then backed away and began to argue that, even if they did need linguistic/AI information of a classic type to improve MT performance (such as lexicons, grammar rule sets, and morphologies), these too could be produced by empirical data-gathering methods and not intuition. In that, they were surely right. That fact constitutes my main argument for the future of hybrid systems for MT, ones that optimize by fusing the best of symbolic and statistical methods and data.
A moment's pause is in order to consider the Systran system, still the world's best performer on unseen text, despised by linguists and AI researchers alike until they needed it as a champion against the statisticians. The truth, of course, is that by dint of 30 years hard labor the Systran teams had produced by hand the large coded knowledge base needed for the symbolic AI approach to work!
Why did the statistical approach do as well as it did so quickly? The best explanation I know is revealing, and also cheering for the future of hybrid systems. Evaluation methods clearly showed that translation fidelity closely correlates with the intelligibility of the output text. Statistical models created a plausible model of generation intelligibility, based on n-gram models of plausible text sequence, and in doing so, dragged by the correlation a substantial amount of MT fidelity along with the intelligibility.
The moral here is clear: MT, like prophecy and necromancy, is easy, not hard. One can do some MT on any theory whatsoever, including word-for-word substitution. So, do not be seduced by the claims of theory - only by results. We now have two competing paradigms, symbolic and statistical, еасh armed with a set of rock-solid examples and arguments, but neither able to beat Systran unaided.
The mass of active MT work in Japan has also, I believe, come up with a cookbook of useful heuristic hints: work with lexicon structures not syntax; preprocess difficult structures in advance of MT input; do not think of MT as an absolute self-contained task but as a component technology that links into and subdivides into a range of related office tasks such as information extraction, word processing, and teaching systems.
The last seems too simple. It is correct but ignores the historic position of MT, the oldest linguistic and AI task, one with a substantial evaluation methodology, so that any natural language processor (NLP) or linguistic theory can still be reliably tested within it. The consequence of these observations is that hybrid cooperative methods are the only way forward in MT, even though for now they may be pursued separately as grammars are extracted empirically from texts and texts are automatically sense-tagged. Work also progresses in parallel on the development of ontology and knowledge bases. They will meet up again, for neither can do without the other, and all attempts to prove the self-sufficiency or autonomy of each have failed and will probably continue to do so.
NOTES:
-
I feel about its condition somewhat the way Мао Tse-Tung is said to have felt about the significance of the French Revolution after 200 years: it's too early to tell. — Его состояние в какой-то степени напоминает мне высказывание Мао-дзе-Дуна которое, как говорят, он сделал по поводу значения Французской Революции почти 200 лет спустя: "Слишком рано говорить".
-
However, these have not yet been brought into contact with large-scale problems, which was why some people took the statistical claims seriously. — Однако, они не были связаны с крупномасштабными задачами, что было причиной тому, что заявления статистиков некоторыми принимались всерьез.
-
to weather the statistical onslaught — выдержать натиск статистиков.
-
... some of us had espoused symbolic theories of language that rested on quantifiable notions of the coherence or preference of linguistic items for each other. ... — некоторые из нас поддержали символьные языковые теории, которые опирались на поддающиеся количественному определению понятия согласования или выбора лингвистических единиц.
-
Evaluation methods clearly showed that translation fidelity closely correlates with the intelligibility of the output text. — Методы оценки отчетливо показали, что ясность итогового текста зависит от точности перевода.
-
... texts are automatically sense-tagged. — ... тексты автоматически связываются по смыслу.
I. Answer the following questions:
-
What cannot all existing systems of machine translation do?
-
Can you describe the methods for escaping deadlocks in machine translations?
-
IBM researches can produce 50-plus percent of correctly translated sentences from unseen sentences in trained corpus, can't they?
-
What can IBM researches do to improve machine translation performance?
-
What did the evaluation methods clearly show?
-
Can you comment on the author's point of view on the work of active Japan machine translations?
-
Is it correct?
II. Write a summary in English.
Stage A
1) Look through the text (scheming reading),