September 2010, Vol. 36, No. 3, Pages 481-504
Posted Online September 10, 2010.
(doi:10.1162/coli_a_00007)
© 2010 Association for Computational Linguistics
Learning Tractable Word Alignment Models with Complex Constraints
João V. Graça*L2F INESC-ID
Kuzman Ganchev**University of Pennsylvania
Ben Taskar†University of Pennsylvania
*INESC-ID Lisboa, Spoken Languange Systems Lab, R. Alves Redol 9, 1000-029 LISBOA, Portugal. E-mail: joao.graca@l2f.inesc-id.pt.
**University of Pennsylvania, Department of Computer and Information Science, Levine Hall, 3330 Walnut Street, Philadelphia, PA 19104-6309. E-mail: kuzman@cis.upenn.edu.
†University of Pennsylvania, Department of Computer and Information Science, 3330 Walnut Street, Philadelphia, PA 19104-6389. E-mail: taskar@cis.upenn.edu.
Word-level alignment of bilingual text is a critical resource for a growing variety of tasks. Probabilistic models for word alignment present a fundamental trade-off between richness of captured constraints and correlations versus efficiency and tractability of inference. In this article, we use the Posterior Regularization framework (Graça, Ganchev, and Taskar 2007) to incorporate complex constraints into probabilistic models during learning without changing the efficiency of the underlying model. We focus on the simple and tractable hidden Markov model, and present an efficient learning algorithm for incorporating approximate bijectivity and symmetry constraints. Models estimated with these constraints produce a significant boost in performance as measured by both precision and recall of manually annotated alignments for six language pairs. We also report experiments on two different tasks where word alignments are required: phrase-based machine translation and syntax transfer, and show promising improvements over standard methods.
Cited by
Nadi Tomeh,
Alexandre Allauzen,
François Yvon. (2014) Maximum-entropy word alignment and posterior-based phrase extraction for machine translation.
Machine Translation 28:119-56.
Online publication date: 24-Sep-2013.
CrossRef Panagiotis Papadimitriou,
Panayiotis Tsaparas,
Ariel Fuxman,
Lise Getoor. (2013) TACI: Taxonomy-Aware Catalog Integration.
IEEE Transactions on Knowledge and Data Engineering 25:71643-1655.
Online publication date: 1-Jul-2013.
CrossRef Adrien Lardilleux,
François Yvon,
Yves Lepage. (2013) Generalizing sampling-based multilingual alignment.
Machine Translation 27:11-23.
Online publication date: 19-May-2012.
CrossRef Zezhong LI,
Hideto IKEDA,
Junichi FUKUMOTO. (2013) Bayesian Word Alignment and Phrase Table Training for Statistical Machine Translation.
IEICE Transactions on Information and Systems E96.D:71536-1543.
Online publication date: 1-Jan-2013.
CrossRef Patrik Lambert,
Rafael Banchs. (2012) BIA: a Discriminative Phrase Alignment Toolkit.
The Prague Bulletin of Mathematical Linguistics 97:-1.
Online publication date: 1-Jan-2012.
CrossRef