| August 2009 |
| Sun |
Mon |
Tue |
Wed |
Thu |
Fri |
Sat |
| |
|
|
Tutorials |
Main Conference |
| |
|
|
26 |
27 |
28 |
29 |
| Workshops |
NIST Workshop |
|
| 30 |
31 |
|
| September 2009 |
| |
NIST Workshop |
|
|
|
|
| |
1 |
|
|
|
|
Wednesday 26-Aug
|
|
Tutorials –
Pre-Conference sessions will be held at the UQO Campus, transportation provided from the Château Laurier. Note that all tutorials are 3 hours long with a 30 minute break in the middle.
WS1: 3rd Workshop on Computational Approaches to Arabic-Script-based Languages Location: UQO Campus, transportation provided
|
| 8:00 |
Registration and Breakfast
|
| 8:30 |
| 9:00 |
| 9:30 |
Morning Tutorials
T1: Introduction to Machine Translation – Mike Dillinger, Translation Optimization Partners
T2: Machine Learning Approaches for Dealing with Limited Bilingual Data in Statistical Machine Translation – Gholamreza Haffari, Simon Fraser University
T3: Tools for Translation: Current Practices, State of the Art, and Capability Gaps – Jennifer DeCamp, MITRE Corporation
|
| 10:00 |
| 10:30 |
| 11:00 |
| 11:30 |
| 12:00 |
| 12:30 |
Lunch |
| 13:00 |
13:30 |
Afternoon Tutorials
T4: The Business Case for Machine Translation
Donald A. DePalma, Chief Research Officer, Common Sense Advisory, Inc.
T5: Postediting Machine Translation
Sharon O'Brien and Giselle de Almeida, Dublin City University; Johann Roturier, Symantec
T6: Up close and personal with a Translator - How Translators Really Work –
Alain Desilets, UQO, Geneviève Patenaude, National Research Council of Canada
|
| 14:00 |
| 14:30 |
| 15:00 |
| 15:30 |
| 16:00 |
| 16:30 |
| 17:00 |
| 17:30 |
Welcome Reception – Château Laurier
|
| 18:00 |
| 18:30 |
| 19:00 |
Previous Next Top
All remaining conference sessions will be at the Château Laurier
| Thursday 27-Aug |
| 8:30 |
Breakfast |
| 9:00 |
Welcome - Conference Overview |
| 9:30 |
Keynote: 9:30-10:15 Johann Roturier, Principal Research Engineer, Symantec
Deploying novel MT technology to raise the bar for quality at Symantec: Key advantages and challenges
Keynote: 10:15-11:00 Marco Trombetti, CEO, Translated.net
Human Translation is a $12 billion a year market of which professional translators get a significant share. Helping translators to be 10% more productive can create a potential market bigger than the current MT market. Is this a realistic direction? What are the requirements of the human translation market? This presentation will provide a live demo of MyMemory, the world's largest translation memory archive - a practical approach to combining machine translation and human translation.
Getting a share of the human translation market with the world's largest Translation Memory
|
| 10:00 |
| 10:30 |
| 11:00 |
Break |
| 11:30 |
Special Plenary Session: Pierre Isabelle & Roland Kuhn, National Research Council, Canada
MT: The Current Research Landscape |
| 12:00 |
| 12:30 |
Lunch |
| 13:00 |
| 13:30 |
| |
Research Papers |
Government Users |
Commercial Users |
| 14:00 |
1 |
Nadi Tomeh, Nicola Cancedda, Marc Dymetman
We describe an approach for filtering phrase tables in a Statistical Machine Translation system, which relies on a statistical independence measure called Noise, first introduced in (Moore 2004). While previous work by (Johnson et al., 2007) also addressed the question of phrase table filtering, it relied on a simpler independence measure, the p-value, which is theoretically less satisfying than the Noise in this context. In this paper, we use Noise as the filtering criterion, and show that when we partition the bi-phrase tables in several sub-classes according to their complexity, using Noise leads to improvements in BLEU score that are unreachable using p-value, while allowing a similar amount of pruning of the phrase tables.
Complexity-Based Phrase-Table Filtering for Statistical Machine Translation |
1 |
Vema McIntosh Canadian Job Bank Automated Translation Memory Tool |
1 |
M T W O R K F L O W
|
Ray Flournoy, Christine Duran (Adobe)
Adobe recently began exploring the integration of Machine Translation technology into its localization workflow. The exploration occurred in two stages: a small pilot followed by a larger project localizing real Adobe documentation. This presentation will discuss the testing process, and in particular the difficulties of moving from a very small pilot to a localization task on real company data. The initial pilot project used two MT engines (one statistical and one traditional) both of which were trained with Adobe data and lexicons. A small test set of 800-2000 words of documentation was machine translated and post-edited, and based on the metrics gathered the second phase was approved. In the second phase, actual product documentation including about 200,000 words of new text was localized with MT. The second phase is still underway, but early results indicate that the project will complete successfully, but with some complications. We will explore those complications in the presentation, including: The editing rate is slightly lower than expected; - The raw MT output quality is lower than expected;
- The technical integration of MT with the localization management system has been much more difficult than expected.
MT and Document Localization at Adobe: from pilot to production |
| 14:30 |
2 |
Alexandre Patry, Philippe Langlais
We propose to estimate the probability that a target word appears in the translation of a given source sentence using a multilayer perceptron. At the expense of ignoring words order and repetition, our model does not assume word alignments and consider all source words jointly when evaluating the probability of a target word. We compared our model against ibmone which does not consider word order neither. Our model was competitive with ibmone when predicting the target words that should be in the translation of a source sentence. When our model was extended to include alignment information, it surpassed ibmone on all the metrics we evaluated.
Prediction of Words in Statistical Machine Translation using a Multilayer Perceptron |
2 |
Elizabeth McGrath On-line Multimedia Parallel Corpus |
2 |
Declan Groves (Traslan), Dag Schmidtke (Microsoft)
Post-editing -- the correction and perfection of content already automatically translated -- is essential in ensuring high-quality translation output, but it is a costly process. The true cost of translation falls not on MT but instead on the pre- and post-processing necessary to produce high-quality output translations in real-world situations. Therefore, to maximize the value of MT and decrease translation costs, we need to understand the front- and back-end of the translation workflow. We carried out a number of linguistic and statistical analysis experiments comparing raw MT output produced by Microsoft's MSR-MT engine (Quirk et al., 2005) with its human post-edited counterpart, for English-Spanish and English-German translation. We identify a number of interesting post-editing patterns, both textual (string-based) and structural-based. In this presentation, we discuss our analysis methodologies, present some of our results and show how this type of analysis can benefit to translation systems and post-editors. We will also discuss results from Microsoft MT post-editing pilots for a number of different language pairs.
Identification and analysis of Post-Editing patterns for MT |
| 15:00 |
3 |
Francisco Guzman, Qin Gao, Stephan Vogel
In this paper we study in detail the relation between word alignment and phrase extraction. First, we analyze word alignment according to several characteristics and compared them to hand-aligned data. Secondly, we analyzed the phrase-pairs generated by these alignments. We observed the unaligned words in the extracted phrase pairs follow the distribution of unaligned words in the alignment from where they were extracted. A manual evaluation of phrase pair quality showed that the more unaligned words in our phrase-pairs the lower the quality. Finally, we presented translation results from using different phrase-tables build upon the different alignments.
Reassessment of the Role of Phrase Extraction in SMT |
3 |
Carol Van Ess-Dykema
Translation Memory (TM) is a computer-aided translation technology gaining wide acceptance in translation service environments. The technology, which finds identical strings (or near-matches) of source language and replaces them with stored target language translations, has proven useful in a number of domains and document types; for example, genres with repetition across the totality of the text (e.g., journal articles) and unchanged text passages (e.g., user manual updates). The benefit of TM is less well known in settings where such recurrent text is not as likely. There needs to be an understanding of the potential for TM in these production environments. The National Virtual Translation Center (NVTC) produces high-quality translation for the Department of Defense and the Intelligence Community. The Center has translated in more than 50 languages, in many critical subject domains, over more than 80 genres. While the Center translates documents typically amenable to TM, a great deal of the work is of a non-recurring nature. The mission of the NVTC creates a rich field for exploiting TM, and finding how TM can be optimized for non-recurrent translation content. The Naval Research Laboratory and National Institute for Standards and Technology have joined with NVTC to conduct a series of experiments to this end. This paper reports on a first, pilot experiment designed to determine the contribution of a translation memory to a series of controlled translation and revision tasks in Russian-, Arabic-, and Chinese-English. The goals of this experiment include: - Determining the parameters of language and genre relevant to judging the suitability of TM; and
- Discovering and baselining the metrics for evaluating and optimizing TM performance in the context of human translation.
The experimental process includes examination of the roles of translating and quality control in a TM-aided environment, with measurements of coverage, time, error rate, and subjective assessments of quality. The results of this experiment will facilitate adjusting the parameters and metrics for future experiments, and ultimately determine the potential of TM in a translation service environment like that of NVTC.
Metrics to Assess Translation Memory Technology |
3 |
Julia Aymerich, Hermes Camelo (PAHO)
The MT Maturity Model at PAHO |
| 15:30 |
4 |
Boxing Chen, George Foster, Roland Kuhn
In this paper, we propose to enhance the phrase translation model with association measures as new feature functions. These features are estimated on counts of phrase pair co-occurrence and their marginal counts. Four feature functions, namely, Dice coefficient, log-likelihood-ratio, hyper-geometric distribution and link probability are exploited and compared. Experimental results demonstrate that the performance of the phrase translation model can be improved by enhancing it with these association based feature functions. Moreover, we study the correlation between the features to predict the usefulness of a new association feature given the existing features.
Phrase Translation Model Enhanced with Association based Features |
4 |
Chuck Simmons
The Foreign Media Collaboration Framework (FMCF) is a service oriented architecture (SOA) that integrates, manages, and exposes foreign language processing tools and leverages Machine Translation (MT), Optical Character recognition (OCR), and Automatic Speech Recognition (ASR) software available on the market today. The FMCF includes a robust set of collaboration tools to support business processes involving human translation and machine translation workflows and is available via an intuitive portal.
Foreign Media Collaboration Framework Program |
4 |
Atefeh Farzindar (NLP Technologies)
TransLI (Translation of Legal Information) was developed for automatic translation of Canadian Court judgments from English to French and from French to English. TransLI came about in response to a frightening reality confronting the Canadian court system: the growing demand for translations in contrast to a lack of available resources. NLP's solution addresses our clientele's volume and time management issues. TransLI has been trained on a large corpus of legal documents as to significantly reduce production time and costs, Throughout the developmental stages, NLP has determined the importance of review and post-editing mechanisms for judicial translations as part of and SMT-integrated workflow: reviewers with subject knowledge require access to the translation process in order to provide a feedback loop to the SMT training process. This, in addition to key components developed by NLP, combine a translation management system with an integrated machine translation engine to create a generic multi-lingual post editing content management system providing a cost and time-effective alternative to traditional methodologies.
An automatic translation management system for legal texts |
| 16:00 |
Break |
| 16:30 |
1 |
Qibo Zhu, Inkpen, Ash Asudeh
Mining officially published web pages can be an invaluable undertaking for translators in government departments who are producing the translations, and for machine translation researchers who are studying how those translations are produced. In this paper, we present the StatCan Daily Translation Extraction System (SDTES) and demonstrate how it is used to induce transla-tions from officially published bilingual mate-rials from government websites in Canada. New evaluation results show that SDTES is a very effective system for identifying and ex-tracting sentences that are translation pairs from most of the federal government web pages which are currently under the CLF2 (Common Look and Feel for the Internet 2.0) framework.
Inducing translations from officially published materials in Canadian government websites |
1 |
Panel 1: Translation in Government: Nick Bemish (Moderator), Dan Scott, Kathi Taylor, Lisa Harper, Chuck Simmons, Donald Barabe, RMK Sinha.
|
1 |
M T Q U A L I T Y |
Hannah Grap (Language Weaver)
Quality has always been the number one argument against machine translation. Not because translations are bad, but because quality cannot be defined in a meaningful way. In this presentation, we will present a new method for ranking quality that focuses on the impact the translated output has on the brand of a commercial organization. Using new ranking algorithms, it is now possible to determine whether end users will accept the automated translations as useful while ensuring that the brand value is protected. The ranking process will be outlined and presented in the case of a real customer.
Ranking MT Quality: Focus on the brand |
| 17:00 |
2 |
Maso Utiyama, Daisuke Kawahara, Keiji Yasuda, Eiichiro Sumita
We propose to mine parallel texts from mixed-language Web pages. We define a mixed-language Web page as a Web page consisting of (at least) two languages. We mined Japanese-English parallel texts from mixed-language Web pages. We presented the statistics for extracted parallel texts and conducted machine translation experiments. These statistics and experiments showed that mixed-language Web pages are rich sources of parallel texts.
Mining Parallel Texts from Mixed-Language Web Pages |
2 |
2 |
Larry Rogers (LexiTech) Translation quality: No longer in the eye of the beholder |
| 17:30 |
3 |
David Kurokawa, Cyril Goutte, Pierre Isabelle
We investigate the possibility of automatically detecting whether a piece of text is an original or a translation. On a large parallel English-French corpus where reference information is available, we find that this is possible with around 90% accuracy. We further study the implication this has on Machine Translation performance. By separating the corpus according to translation direction and training separate Phrase-Based MT systems, we show that reliably detecting translation direction yields improved translation performance. This suggests that paying attention to the translation direction when building a parallel corpus for the purpose of training a statistical MT system may have an effect on the output quality.
Automatic Detection of Translated Text and its Impact on Machine Translation |
3 |
3 |
Jordi Carrera, Alex Yanishevsky (ProMT)
Initially, translation memory was met with skepticism. However, currently it is a vital part of the translator's toolkit. Machine translation seems to be following the same trajectory in the world of translators. For machine translation to become a mainstay in production, it needs to prove itself as a valuable addition to the translator's toolkit. The goal of the presentation is to begin to lay a foundation for such proof by focusing on: -high level discussion of pros/cons of various algorithms for evaluating machine translation -correlation between human evaluation of machine translated texts and algorithms for evaluating machine translation -correlation between human evaluation, algorithm evaluation of machine translation and an increase in productivity on specific translation projects
Technology for Translators: What doesn't Kill You, Makes You Stronger |
| 18:00 |
IAMT General Membership Business Meeting |
| 18:30 |
Previous Next Top
| Friday 28-Aug |
| 8:30 |
Breakfast |
| |
Research Papers |
Government Users |
Commercial Users |
| 9:00 |
1 |
Nguyyen Bach, Qin Gao, Stephan Vogel
We propose a novel source-side dependency tree reordering model for statistical machine translation, in which subtree movements and constraints are represented as reordering events associated with the widely used lexicalized reordering models. This model allows us to efficiently capture the subtree-to-subtree transitions observed not only in the source of word-aligned training data but also at decoding time. Using subtree movements and constraints as features in a log-linear model, we are able to help the reordering models make better selection. It also allows the subtle importance of monolingual syntactic movements to be learned alongside other reordering features. We show improvements in translation quality on English-Spanish and English-Iraqi translation tasks.
Source-side Dependency Tree Reordering Models with Subtree Movements and Constraints |
1 |
Kristen Summers
The EDEAL project seeks to identify, collect, evaluate, and enhance resources relevant to processing collected material in African languages. Its priority languages are Swahili, Hausa, Oromo, and Yoruba. Resources of interest include software for OCR, Machine Translation (MT), and Named Entity Extraction (NEE), as well as data resources for developing and evaluating tools for these languages, and approaches-whether automated or manual-for developing capabilities for languages that lack significant data resources and reference material. We have surveyed the available resources, finding useful existing solutions for OCR on Swahili, Hausa, and Oromo, and for MT on Swahili and Hausa. The project is now in its first execution phase, focused on providing end-to-end capabilities and solid data coverage for a single language; we have chosen Swahili since it has the best existing coverage to build on. In future phases, we will address the additional languages. We will describe our findings on tools, data, and approaches, and we will present our present work, including the development of a significant data collection to support OCR, NEE, and MT, which will be freely available to the U.S. Government community.
CACI - OCR, MT and Integration Efforts for African LCTLs |
1 |
B U I S I N E S S S T R A T E G I E S |
David Lubensky, Salim Roukos (IBM)
As IBM becomes a globally integrated enterprise, it needs globally oriented employees. IBM employs about 400,000 people in 160 countries. Not everyone speaks the same language, when the language diversity is properly harnessed it results in incredible strength- employees can effectively communicate in their native language with colleagues and clients, mine and share business artifacts from documents written in multiple languages, browse and extract content from internal and external web pages and various enterprise knowledge sources, access multilingual support collateral, and leverage the power of a large multilingual workforce through crowdsourcing. At IBM, deployment of Real Time Translation Services (RTTS) reduces the language barrier. In this paper we'll describe our ongoing deployment of RTTS across the enterprise and use-cases and experiences we've encountered along the way.
Real Time Translation Services at IBM |
| 9:30 |
2 |
Sirvan Yahyaei, Christof Monz
In this paper we present an extension of a phrase-based decoder that dynamically chunks, reorders, and applies phrase translations in tandem. A maximum entropy classifier is trained based on the word alignments to find the best positions to chunk the source sentence. No language specific or syntactic information is used to build the chunking classifier. Words inside the chunks are moved together to enable the decoder to make long-distance reorderings to capture the word order differences between languages with different sentence structures. To keep the search space manageable, phrases inside the chunks are monotonically translated.
Decoding by Dynamic Chunking for Statistical Machine Translation |
2 |
Chris Wendt, Will Lewis (Microsoft)
The nature of statistical MT systems includes the ability to relatively quickly and easily customize them to a company's specific domain, by training it on the company's own parallel data.
We are presenting a case study of a customized statistical MT system, which has been trained with an organization's proprietary data, and show how, and by how much, we can improve the quality of this customized system by using additional training data from trusted sources outside the organization, for instance using data that other companies and organizations have shared in the TAUS Data Association. We will show the process, the criteria, the mechanisms, and the automatic and human evaluation results for each step in the process, enabling the audience to make deliberate choices about how to enhance the composition of training data for their SMT installation.
Pushing the quality of a customized SMT system using shared training data |
| 10:00 |
3 |
Vanh Nguyen, Shimazu, Le Nguyen and Phuong Nguyen
In this paper, we present a reordering model based on Maximum Entropy. This model is extended from a hierarchical reordering model with PBSMT (Galley and Manning, 2008), which integrate syntactic information directly in decoder as features of MaxEnt model. The advantages of this model are (1) maintaining the strength of phrase based approach with a hierarchical reordering model, (2) many kinds of linguistic information integrated in PBSMT as arbitrary features of MaxEntropy model. The experiment results with English-Vietnamese pair showed that our approach achieves improvements over the system which use a lexical hierarchical reordering model (Galley and Manning, 2008).
Improving A Lexicalized Hierarchical Reordering Model Using Maximum Entropy |
2 |
R.M.K. Sinha, Indian Inst of Tech
The Government of India established the National Translation Mission (NTM) (http://www.ntm.org.in/ntm) to address the formidable challenges of multilingual information sharing in India. The primary goal of NTM is .to make knowledge-based texts accessible in all Indian languages through translation. The idea of NTM stemmed from a statement of the Prime Minister of India stressing how vital is the access to translated material, for increasing access to knowledge in many critical areas.. In this presentation, I present a framework aimed at integrating MT in the human translation process for enhancing translation throughput, bootstrapping MT capability, translators. training, tool deployment, employment generation, and finally providing access to knowledge in native language(s) to socially deprived masses.
Indian framework for integrating MT and HT |
|
Sven Christian Andrä (Andrä AG), Jörg Schütz (bioloom group)
This presentation focuses on emerging language ecosystems with new collaborative approaches and business models based on cloud computing technology platforms, smart marketplaces with aggregation, recombination and hyperdistribution. We talk about ecosystems because this term describes adaequately the evolutionary character of the emerging online landscape with various niche markets and their demands and requirements besides the industrial main stream developments.
Our discussion also includes succint case studies on how a big German automotive manufacturer and an SME in the field of serious gaming are employing such services through a flexible Translation Management Framework within a three-dimensional translingual webservice-based information space. In this scenario, the first dimension represents the entire workflow. The second dimension depicts the different information sources. The third dimension reflects the management of the information's life cycle. The framework's components and modules work together in an adaptable and emergent fashion thus allowing for additional savings, revenues and even new market gains.
MT Bazaar: Translation Ecosystems in the Cloud
|
| 10:30 |
Break |
| 11:00 |
Panel 2: 11:00-12:30: Panel Discussion: Converging Technologies: What are the benefits for MT users?
Terry Lawlor/SDL, Daniel Gervais/Multicorpora, Olga Beregovaya/ProMT, Jaap van der Meer/TAUS Data Assocation Discussants: Melissa Biggs/Sun Microsystems, Paul Bremer/Apptek |
| 12:00 |
Technology Showcase Noon - 17:00 |
| 12:30 |
Lunch |
| 13:00 |
Research Poster Sessions
13:00-15:00 Session A
15:30-17:30 Session B |
| 13:30 |
| 14:00 |
TAUS Data Association Roundtable (Workshop Separate Registration Fee) 14:00 - 17:00 |
Special Session: Translator Tools and Training
|
| 14:30 |
Panel 3: 14:00-15:00
Machine Translation (MT) and Computer Assisted Translation have seen remarkable advances in the last decade. However, current MT tools are not a serious choice for professional translation in any language combination. Moreover, translation practitioners are still looking for productive ways to merge the two approaches. Our belief is that a greater attention should be given to the process of training translators in an era of technology. The community should also pay close attention to empirical studies of translation so that computational linguists will have a better idea of what really goes on in translation and develop tools that will be more useful for the end. Our panel will discuss how we prepare humanities students to succeed as translators in a technology-driven environment. Prof. Girju will discuss the interface between human and machine translation and the related pedagogical and research issues. Profs. Lowe and Phillips Batoma propose strategies for training language students to feel at ease with information technology in general and translation technology in particular. Prof Minacori from the University of Paris will discuss the translation assessment tool that she has developed in collaboration with Ioan Roxin at the Université de Montbéliard, linking the issues of production and quality assessment of translation through technology.
Preparing Translators for the current technology landscape
Roxana Girju, Patricia Batoma and Elizabeth Lowe, Patricia Minacori
15:00 Pierrette Bouillon, Marianne Starlander
At the Translation and Interpretation School (ETI) of the University of Geneva, we have been teaching Machine translation classes for many years. The aim of the class "Machine translation and its applications" is to show how a controlled language improves machine translation. The work in class is divided into four main steps: 1. The students control a technical aeronautics text following the Simplified English rules. 2. They automatically translate the controlled and the uncontrolled versions of the text with a commercial MT tool. 3. They specialize the dictionaries in order to obtain a translation in conformity with the Français rationalisé rules. 4. They evaluate the results. They count the number of post-editions necessary to reach a perfect translation from the uncontrolled version compared to the controlled version with two different MT systems. The aim of this class is to address four important aspects of MT: pre-editing, system specialization and dictionary interfaces, post-editing and evaluation of current MT quality. All the material necessary for this class is given on the e-learning platform Moodle.
Technology in Translator Training and Tools for Translators
15:30 Break
16:00 Cheryl McBride
Translation Memory (TM) systems are among the most aggressively marketed and widely used translation technologies. Previous studies have focused on when and how TMs are used, but there is significantly less information available relating to translators' perceptions of and attitudes towards them. This study explores online discussion forums (e.g. on TranslatorsCafe.com, ProZ.com), where the translation community discusses the issues affecting them and their work. An investigation of TM-related threads will reveal translators' unprompted opinions and perspectives, which are not typically captured in formal questionnaires. This paper will describe the corpus of TM-related postings, the methodology used to analyze it, and the results of this analysis, including answers to questions such as the following: Which tools are most popular? Which features are most appreciated/most problematic? Are there suggestions for improving tools? How does TM use affect the translator/client relationship? Such answers are relevant not only to translators, but also to tool developers, clients/employers and trainers. With a better understanding of different perspectives and attitudes, translators can evaluate and potentially alter their own perceptions, developers can respond more accurately to users' needs, and clients can better comprehend translators' needs and concerns.
IMHO: Translators' not-so-humble opinions on using TM systems
16:30 Achim Ruopp
When undertaking new translation projects, translators often face the following questions: What is the correct terminology for translating texts from the project domain? Am I using the most current terminology in the field? How were these terms translated by others in similar contexts? The BigTM.net system provides a solution. Using the text that is to be translated, the system searches the web for similar parallel pages, extracts and rates the discovered translations. An intuitive web interface is provided for the translator to search the resulting domain-specific parallel corpus. The discovered corpus can also be used to train domain-specific machine translation systems and extract initial terminology lists. Results of case studies for different language pairs will be presented, including sizes and similarity measures of the discovered corpora.
BigTM.net: Gaining a head start on translation projects
17:00 Jennifer DeCamp What's missing in User-centric MT?
|
| 15:00 |
| 15:30 |
| 16:00 |
| 16:30 |
| 17:00 |
| 17:30 |
| 18:00 |
| 18:30 |
| 19:00 |
Banquet: IAMT Award of Honor |
Previous Next Top
| Saturday 29-Aug |
| 8:30 |
Breakfast |
| |
Research Papers |
Government Users |
Workshop: Beyond TMs |
| 9:00 |
1 |
Sherri Condon, Gregory Sanders, Dan Parvaz, Alan Rubenstein, Christy Doran, John Aberdeen and Beatrice Oshika
Evidence is presented to support the hypothesis that variation and inflection in Arabic has a negative impact on scores from automated measures of speech translation (e.g., WER, BLEU). Normalization operations improve correlation between BLEU scores and Likert-type judgments of semantic adequacy -- as well as between BLEU scores and human judgments of successful transfer of the meaning of individual content words from English to Arabic.
Normalization for automated Metrics: English and Arabic Speech Translation |
1 |
Rod Holland, MITRE Clipper with Korean OCR and MT |
|
B E Y O N D T M s
|
Workshop: Beyond Translation Memories
Full day workshop Separate registration
Workshop Program |
| 9:30 |
2 |
Zhao Hongmei, Xie Jun, Liu Qun, Lu Yajuan, Zhang Dongdong and Li Mu
We presents an introduction to the CWMT2008 evaluation and focus on its two new metrics: BLEU-SBP (Chiang et al., 2008) and linguistic check-point method (Zhou et al., 2008). Our experiments validated BLEU-SBP's effectivity in resolving the nondecomposability problem of both NIST-BLEU and IBM-BLEU at sentence level. Our evaluation indicates linguistic check-point method is a valid metric to evaluate the capability of an MT system in translating various linguistic phenomena. With the aid of these metrics, we disclosed some performance differences between statistical MT systems and rule-based MT systems. We suggests the high BLEU score doesn't necessarily mean high translation adequacy.
Introduction to China's CWMT2008 Machine Translation Evaluation |
2 |
Rod Holland, MITRE
The reader of a machine-translated text is actually forming hypotheses about the original text, based on multiple sources of evidence, of which the actual machine translation output is only one. Among other things, these sources of evidence include the original text, the reader's knowledge of the language of the original text, the reader's knowledge about the language and culture of the original text, the reader's understanding of how machine translation works, the reader's recourse to dictionaries and other linguistic resources, and the reader's world and domain knowledge. We will illustrate the influence of these sources of evidence in the reading of some machine-translated examples from Chinese web pages.
How to Read a Machine-Translated Text |
| 10:00 |
3 |
William Ogden, Ron Zacharski, Sieun An and Yuki Ishikawa
A method for evaluating MT performance embedded in Cross-Language Instant Messaging (CLIM) systems is presented. A web interface that provided concurrent real-time translation for instant messaging from multiple MT services was developed and used by paid participants to collaborate on a photo identification task. The method showed a task performance benefit due to the availability of multiple translation alternatives. The method also provides a new evaluation metric for MT systems based on user's task motivated choices. This method was used to compare two English-Japanese online translation systems, one from Google, and one from Excite/Japan.
User choice as an evaluation metric for web translation services in cross language instant messaging applications |
3 |
Susumu Bani
The Japan Patent Office (JPO) has two types of systems in which machine translations are utilized: 1) IPDL (Industrial Property Digital Library) which refers to a system that offers IP information in English to the public users all over the world free of charge; 2) AIPN (Advanced Industrial Property Network) which refers to a system that provides examination-related information in Japan, e.g., legal status information, cited document information, etc. to the examiners of foreign IP offices in order to contribute to work-sharing among IPOs.
The presentation will consist of the following subjects: 1) an explanation on how the machine translations are utilized in the systems; 2) an introduction to the JPO.s current status and future plan, as well as to the JPO.s approach to improve the accuracy of machine translation.
The Japan Patent Office's Use of MT |
| 10:30 |
Break |
| 11:00 |
Keynote: 11:00-11:45 - Dan Scott - US Office of Dir of Nat'l Intelligence, Foreign Language Program Office
The Director of the Foreign Language Program Office, Dan Scott, will present his viewpoints on users, machine translation needs, and measurement of success. He will present a range of user scenarios which present challenges to the MT community to create valid and replicable measures of success. These metrics must be meaningful within the context of various workflows and use cases. If there is no single "MT solution" nor a single metric, then the riddle for the MT community to answer is: For users, what exactly is the meaning of 94% success?
What is the Meaning of Reaching 94% Success?
Keynote: 11:45 - 12:30 - Doug Jones - MIT Lincoln Lab
This talk will discuss machine translation evaluation in the context of multilingual speech and text processing applications, including the following highlights: (1) creation of new machine translation evaluation methods linked with standard Defense Language Institute measures of human foreign language skills, to measure the utility of machine translation for government applications; (2) development of a testbed for machine translation algorithms, with a special focus on less commonly taught languages. The key idea is to use human translation practices as the control condition for a baseline, and to contrast test scores on that condition with conditions involving human interaction with machine translation systems. Higher standardized test scores means better translation.
Can You Score Higher with MT?
|
| 11:30 |
| 12:00 |
| 12:30 |
Lunch |
| 13:00 |
| 13:30 |
| 14:00 |
1 |
Satoshi Kamatani, Tetsuro Chino and Kazuo Sumita
Hybrid Spoken Language Translation Using Sentence Splitting Based on Syntax Structure|Satoshi Kamatani, Tetsuro Chino and Kazuo Sumita|In this paper, we propose a hybrid spoken language translation method utilizing sentence segmentation. By portioning the sentence using the result of syntax analysis, we can utilize rule-based control of the integration of subtranslations translated by a suitable method for each segment. We also report a preliminary experiment on translation quality of our prototype Japanese-to-English translation system. We confirmed that our method achieved a 13.4% advantage in NIST score for the individual RBMT method, and a 6.0% advantage for the individual EBMT method.
Hybrid Spoken Language Translation Using Sentence Splitting Based on Syntax Structure |
1 |
Nicholas BemishIT Challenges for MT deployments on USG networks |
|
B E Y O N D T M s
|
Workshop: Beyond Translation Memories
|
| 14:30 |
2 |
Stephen Soderland, Christopher Lim, Mausam Mausam, Bo Qin, Oren Etzioni and Jonathan Pool
Statistical MT is limited by reliance on large parallel corpora. We propose Lemmatic MT, a new paradigm that extends MT to a far broader set of languages, but requires manual encoding effort. The author encodes each sentence as a sequence of words drawn from a translation dictionary. We report on an experimental investigation of LEMUEL, a prototype Lemmatic MT system that outperforms Google Translate and also has high translation adequacy on language pairs not handled by Google Translate.
Lemmatic Machine Translation |
2 |
Vaugn Laganosky and/or Tracy Blocker
Since 2005, the US Army and the Joint Staff have validated requirements for machine foreign language translation capability. In the effort to develop a comprehensive machine foreign translation capability, the Army not only needs to enable software to handle one of the most complex systems that humans deal with, but also the architecture and processes to routinely produce and maintain this capability. The Army has made the initial effort, funding a machine foreign language translation program known as the Sequoyah Foreign Language Translation Program. It is intended to be the overarching Army Program with Department of Defense interest to provide machine foreign language translation capabilities that meet language translation gaps.
US Army MT Requirements and Capability
|
| 15:00 |
3 |
Dmitriy Genzel, Klaus Macherey and Jakob Uszkoreit
We introduce the first machine translation system for Yiddish-English and English-Yiddish. We discuss challenges presented by this language and their solutions, including an algorithm for cognate extraction.
Creating a High-Quality Machine Translation System for a Low-Resource Language: Yiddish |
3 |
Jennifer DeCamp USG Language Technology Resource Center |
| 15:30 |
4 |
Philipp Koehn, Alexandra Birch and Ralf Steinberger
We built 462 machine translation systems for all language pairs of the Acquis Communautaire corpus. We report and analyse the performance of these system, and compare them against pivot translation and a number of system combination methods (multi-pivot, multi-source) that are possible due to the available systems.
462 Machine Translation Systems for Europe |
4 |
Michelle Vanni
MT of Names: It's more than you think! |
| 16:00 |
Break |
| 16:30 |
1 |
Philipp Koehn and Barry Haddow
We investigate novel types of assistance for human translators, based on statistical machine translation methods. We developed a tool that makes suggestions for sentence completion, shows word and phrase translation options, and allows post-editing of machine translation output. A user study validates the types of assistance and provides insight into the human translation process.
Interactive Assistance to Human Translators using Statistical Machine Translation Methods |
1 |
Lisa Harper
As suited as Statistical Machine Translation is to large volume translation of news corpora, manuals, and other texts for which there are large parallel corpora from which to train, optimizing the use and value of MT within the Intelligence Community (IC) remains a significant challenge. Customizations methods, in general, include the development of custom dictionaries or incremental re-training of a Translation Model. While suitable source texts and their existing translations may be available to government organizations for MT training and customization, alignment of these text pairs at the sentence level is the crucial first step toward producing usable training data. Extracted terms may also used to develop dictionary resources. But what is the relative cost and value of customization? How do we develop and sustain a customization program given the ebb and flow of interest across languages? How do we manage customization resources? We present these issues and propose a general roadmap for addressing them.
Sustainable MT Customization |
|
B E Y O N D T M s
|
Beyond Translation Memories Workshop
|
| 17:00 |
2 |
Aarthi Reddy, Richard Rose, Hani Safadi, Samuel Larkin and Gilles Boulianne
A method is presented which integrates target language automatic speech recognition (ASR) models with source language statistical machine translation (SMT) and named entity recognition (NER) information at the phonetic level. Information extracted from a source language document including translation model probabilities and translated named entities are combined with acoustic-phonetic information obtained from phone lattices produced by the ASR system. Phone-level integration allows the combined MAHT system to correctly decode words that are either not in the ASR vocabulary or would have been incorrectly decoded by the ASR system. It is shown that the combined MAHT system results in a decrease in word error rate on the dictated translations of up to 32% relative to a stand alone baseline ASR system.
Incorporating Knowledge of Source Language Text in a System for Dictation of Document Translations |
2 |
Reginald Hobbs
Commercial off-the-shelf (COTS) machine translation engines and translation support tools such as translation memory (TM) have been developed primarily for use in translating grammatically well-formed, edited text. The real-world, foreign language (FL) document collections that our users work with consist instead of noisy and complex image files. We are currently conducting experiments that involve building and evaluating the effectiveness of different multi-component workflows for the automated processing and translation of these FL images into English text. The evaluation side of our experiments requires "ground truth" (GT) character files of the FL text and "reference translations" (RT) of the English text. Thus the purpose of the RTs is for evaluation of the individual MT engines, as well as their output at the end of different image processing workflows. In observing our team translator translate previously-produced GT texts into RTs, we noticed the range of translation-support resources and methods that he applies to the task. Given his experience with semi-automated and fully automated MT metrics and his involvement in our design of MT user-support and evaluation tools, we are now working ---as we have in the past--- in an extreme programming paradigm to develop a software framework that (i) streamlines access to and later add-ons to different resources he already uses (ex. MT, TM, dictionaries, morphological analyzers, LM), (ii) provides persistent data objects for later re-use, and (iii) enables him to set the tools’ options via a configuration page that in turn impacts the translation canvas page. In this paper, we present results of this development paradigm with rapid turnaround in software modifications and iterative testing, where the translator serves on the software design team and as the software’s main user and tester, while others develop the code base and test datasets, and analyze experimental results.
On beyond: When the Translator Leads the Design of a Translation Support Framework |
| 17:30 |
3 |
Michel Simard, Pierre Isabelle
We explore the problem of integrating a phrase-base MT system within a computer assisted translation (CAT) environment. We argue that one way of achieving successful integration is to design a MT system that behaves more like the translation memory (TM) component of CAT systems. This implies producing MT output that is consistent with that of a TM when high-similarity material exists in the training data; it also implies providing the MT system with a component to filter out machine translations that are less likely to be useful. Our results indicate that the proposed approach leads to systems that produce better output than a TM, for a larger portion of the source text.
Phrase-based Machine Translation in a Computer-assisted Translation Environment |
| 18:00 |
4 |
Lucia Specia, Marco Turqui, Zhuoran Wang, John Shawe-Taylor and Craig Saunders
We investigate the problem of estimating the quality of the output of machine translation systems at the sentence level when reference translations are not available. The focus is on automatically identifying a threshold to map a continuous predicted score into ``good'' / ``bad'' categories for filtering out bad-quality cases in a translation post-edition task. We use the theory of Inductive Confidence Machines (ICM) to identify this threshold according to a confidence level that is expected for a given task. Experiments show that this approach gives improved estimates when compared to those based on classification or regression algorithms without ICM.
Improving the Confidence of Machine Translation Quality Estimates |
| 18:30 |
Closing Session |
Previous Next Top
| Sunday 30-Aug |
| Full Day Post-Conference Workshops at the Château Laurier |
| 7:30 |
Registration & Breakfast |
| 8:00 |
| 8:30 |
| 9:00 |
WS4: New Tools, Standards, and Possibilities for Dictionaries and Dictionary Exchange
| WS7: The 3rd Workshop on Patent Translation
|
| 9:30 |
| 10:00 |
| 10:30 |
Break
|
| 10:45 |
Organizer: Jennifer DeCamp (MITRE) |
Organizer: Terumasa Ehara (Yamanashi Eiwa College) Shoichi Yokoyana (Yamagata University) |
| 11:00 |
| 11:30 |
| 12:00 |
Lunch |
| 12:30 |
| 13:00 |
| 13:30 |
| 14:00 |
WS4: New Tools, Standards, and Possibilities for Dictionaries and Dictionary Exchange
| WS7: The 3rd Workshop on Patent Translation
|
| 14:30 |
| 15:00 |
| 15:30 |
Break |
| 15:45 |
Goal: Participants will explore means of leveraging dictionary, lexicon, and terminology research and tools development to benefit both MT and Computer Assisted Human Translation (CAHT). |
Goal: to foster research and development of the technology for patent translation by providing a forum in which researchers and practitioners can exchange their ideas, approaches, perspectives, and experiences from their work in progress. |
| 16:00 |
| 16:30 |
| 17:00 |
End of MT Summit XII |

Last updated: Fri Sep 4 09:44:32 PDT 2009
|