Agreement-based Learning. ▇▇▇▇▇ et al. [2006] first introduce agreement-based learning into word alignment: encouraging asymmetric IBM mod- els to agree on word alignment, which is a latent struc- ture in word-based translation models [▇▇▇▇▇ et al., 1993]. This strategy significantly improves alignment quality across many languages. They extend this idea to deal with more latent-variable models in grammar induction and predicting missing nucleotides in DNA sequences [▇▇▇▇▇ et al., 2007]. ▇▇▇ et al. [2015] propose generalized agreement for word alignment. The new general framework allows for arbitrary loss functions that measure the disagreement between asym- metric alignments. The loss functions can not only be defined between asymmetric alignments but also between alignments and other latent structures such as phrase segmentations. In attention-based NMT, word alignment is treated as a parametrized function instead of a latent variable. This makes word alignment differentiable, which is important for training attention-based NMT models. Although alignment matrices in attention-based NMT are in principle “symmetric” as they allow for many-to-many soft alignments, we find that unidi- rectional modeling can only capture partial aspects of struc- ture mapping. Our contribution is to adapt agreement-based learning into attentional NMT, which significantly improves both alignment and translation.

Appears in 3 contracts

Sources: Research and Development, Joint Training Agreement, Research and Development

Common use of Agreement-based Learning Clause in Contracts