WS3: Beyond Translation Memories Workshop

Saturday, August 29, 2009 - MT Summit Conference
Chateau Laurier, Ottawa, Ontario, Canada
9:00 Opening remarks
9:30 How translators use tools and resources to resolve translation difficulties: an ethnographic study
     A. Désilets (National Research Council, Canada), C. Melançon (University of Quebec in Outaouais, Canada), G. Patenaude (National Research Council, Canada) and L. Brunette (University of Quebec in Outaouais, Canada)

Research in Translation Technology is often carried out by people and teams that have little knowledge of how translators actually work. The present paper aims at partially filling that knowledge gap by providing a concise, yet useful portrait of how human translators use linguistic resources and technology to resolve translation problems (ex: terminology, phraseology). It is based on quantitative and qualitative data gathered through a Contextual Inquiry study, conducted with 8 professional translators in Canada. The paper sheds light on questions such as: (a) What are the kinds of translation problems encountered by translators? (b) When, in the translation process, do they try to find solutions to these problems? (c) What kinds of resources do they include in their toolbox? (d) Do they tend to use certain types of resource more than other? (e) How do they assess solutions proposed by various tools, and choose the one that seem most appropriate for their needs? (f) To what extent do translators contribute information to those resources? Wherever applicable, we discuss possible implications of those findings for people involved in the research, development, teaching or administration of tools and linguistic resources for translators.

10:00 Translation editing environments
     E. Lagoudaki (University of Wolverhampton, United Kingdom)

This paper reports on feedback received by translation professionals regarding the translation editing environments they use within or in conjunction with their Translation Memory system. Four options of environments emerge and their respective advantages and disadvantages are discussed. It is shown how these environments impact on the translation process that their users follow and how the process could be improved. A number of needs concerning the functionality of translation editors, as well as the user interface, are presented with reference to particular use contexts.

10:30 Break
11:00 Keynote speakers invited by MT Summit
11:30
12:00
12:30 Lunch
13:00
13:30
14:00 ReEscreve: A translator-friendly multi-purpose paraphrasing software tool
     A. Barreiro (Linguateca, Portugal), Luis Miguel Cabral (Linguateca, Norway)

This paper describes ReEscreve, a multi-purpose paraphraser that uses grammar-based paraphrasing capabilities suitable for pre-editing and useful for human and machine translation. At the current stage, ReEscreve transforms, with a 93.4% precision, support verb constructions into verbs or similar expressions, but it is being used to progressively paraphrase other linguistic phenomena enabling it to be used as an authoring and stylistic aid in word processing applications. ReEscreve is freely available on the Internet at http://www.linguateca.pt/Reescreve/.

14:30 The Web as a source of informative background knowledge
     C. Barrière (National Research Council, Canada)

In this paper, we present how a tool called TerminoWeb can be used to help translators find background information on the Web about a domain, or more specifically about terms found in a text to be translated. Termi-noWeb contains four modules working together to achieve such goal: (1) a search engine specifically tuned for informative texts and glossaries where background knowledge is likely to be found, (2) a term extractor able to automatically discover the important terms of a source text, (3) a query generator able to automatically launch multiple queries on the Web from a set of extracted terms. The result of these first three steps is a background corpus which can then be explored by (4) a corpus exploration module in search of definitional sentences and concordances. In this article, an in-depth example is used to provide a proof of concept of TerminoWeb.s background information search and exploration capability.

15:00 Bitextor, a free/open-source software to harvest translation memories from multilingual websites
     M. Esplà-Gomis (Universitat d’Alacant, Spain)

Bitextor is a free/open-source application aimed to harvesting translation memories from multilingual websites. It downloads all the HTML files in a website, preprocesses them into a coherent and suitable format and, finally, applies a set of heuristics to make pairs of files which are candidates to contain the same text in two different languages (bitexts). From these parallel texts, translation memories are generated in TMX format using the library LibTagAligner, which uses the HTML tags and the length of text chunks to perform the alignment.

15:30 A Web service enabling gradable post-edition of pre-translations produced by existing translation tools: practical use to provide high-quality translation of an on-line encyclopedia
     H. Blanchon (University of Grenoble, France), C. Boitet (University of Grenoble, France) and C.-P. Huynh (University of Grenoble, France)

SECTra_w is a Web-based system offering several services, such as supporting MT evaluation campaigns and online post-editing of MT results, to produce reference translations adapted to classical MT systems not built by machine learning from a parallel corpus. The service we are interested in here is the possibility for its users to import a document, or a set of documents (not only a list of preprocessed segments), and achieve high-quality translation by applying on-line human post-edition to results of Machine Translation systems. The on-line human post-edition may be carried out by a community of contributing post-editors. In this paper, we describe the use of SECTra_w to translate into French a set of 25 html documents (about 220,000 words) on water and ecology from the on-line Encyclopedia of Life Support Systems 1 (EOLSS) using a contributive on-line human post-edition framework.

16:00 Break
16:30 Grounding translation tools in translator.s activity data
     M. Carl (Copenhagen Business School, Denmark)

This paper presents a technology and a representation for gathering and analyzing user activity data from translation experiments. New technologies and novel ways of using existing technology could emerge with enhanced knowledge about translator’s behavior and a tight integration into translation models.

17:00 Productivity and quality in MT post-editing
     A. Guerberof (Universitat Rovira i Virgili, Spain)

Machine-translated segments are increasingly included as fuzzy matches within the translation-memory systems in the localization workflow. This study presents preliminary results on the correlation between these two types of segments in terms of productivity and final quality. In order to test these variables, we set up an experiment with a group of eight professional translators using an on-line post-editing tool and a statistical-base machine translation engine. The translators were asked to translate new, machine-translated and translation-memory segments from the 80-90 percent value using a post-editing tool without actually knowing the origin of each segment, and to complete a questionnaire. The findings suggest that translators have higher productivity and quality when using machine-translated output than when processing fuzzy matches from translation memories. Furthermore, translators’ technical experience seems to have an impact on productivity but not on quality. Finally, we offer an overview of our current research.

Last updated: Mon Aug 17 10:59:10 PDT 2009