Common use of Frequency Filter Clause in Contracts

Frequency Filter. As the system does not create alignments itself (i.e. translation candidates), it must rely on the efficiency of the statistical alignment tools from which it receives the aligned candidates. The first step is therefore to identify the best translation proposals, in terms of recall (as many terms as possible) and precision (as good translations as possible). Two factors influence the translation quality of the P2G tool: the selection of the alignment tool, and the selection of the thresholds for frequency and translation probability. For the alignment tool, it can easily be seen that GIZA++ only is insufficient, as no multi-word entries are found, which form nearly 50% of a lexicon / term list, especially in narrow domains. So the focus was on phrase alignment tools, which also give superior quality in translation (Och and Ney, 2004). To create phrase alignment, two alignment methods were tried out1:  Giza++ and ▇▇▇▇▇ (▇▇▇▇▇ et al., 2007), creating Phrase Tables. From the LT_automotive input data (cf. below), a phrase table with about 7.97 mio entries was built.  Phrases as produced with Anymalign (Lardilleux/Lepage 2009). Anymalign created about 3.14 mio word/phrase pairs from the same input data. It soon turned out that if frequency is not considered, too much noise would be in the output. Therefore, frequency (on source and target side) is used and set to > 1. For the translation probability, tests were done to find the optimal recall / precision combination. The two alignment systems were compared, using different values for the translation probability. For evaluation, a random set of term candidates manually inspected2, and the errors in alignment / translation were counted3. The results are given in Table1. ▇▇▇▇▇ p > 0.8 12.000 5.54% ▇▇▇▇▇ 0.6 < p < 0.8 3.900 5.42% ▇▇▇▇▇ 0.4 < p < 0.6 20.000 55.11% AnymAlign p > 0.7 12.600 46.91% 1 Input from PEXACC (▇▇▇ et al., 2011) for comparable corpora is also supported. 2 Entries starting with the letters C, F, and S. 3 There are always unclear cases among translations (e.g. transfers usable only in certain cases); they were not counted as errors. Errors are only clearly wrong translations; however a range of subjectivity remains. AnymAlign p > 0.8 10.900 47.56% It can be seen that the Moses alignment has much better quality, and is in the reach of being usable; AnymAlign error rates are approximately ten times higher. For AnymAlign, taking a higher threshold (0.8 instead of 0.7) does not improve alignment quality. Overall, Moses input with a threshold of 0.6 for P(f|e) seems to give the best results for term extraction, for this size of phrase tables4, with an overall error rate of about 5.5%: it increases recall without reducing precision. It should be noted that alignment errors result from external phrase alignment components, and are just „inherited‟ by the current extraction system. However, they count in the overall workflow evaluation: Incorrect translation proposals lead to significantly higher human reviewing effort.

Appears in 1 contract

Sources: Grant Agreement

Frequency Filter. As the system does not create alignments itself (i.e. translation candidates), it must rely on the efficiency of the statistical alignment tools from which it receives the aligned candidates. The first step is therefore to identify the best translation proposals, in terms of recall (as many terms as possible) and precision (as good translations as possible). Two factors influence the translation quality of the P2G tool: the selection of the alignment tool, and the selection of the thresholds for frequency and translation probability. For the alignment tool, it can easily be seen that GIZA++ only is insufficient, as no multi-word entries are found, which form nearly 50% of a lexicon / term list, especially in narrow domains. So the focus was on phrase alignment tools, which also give superior quality in translation (Och and Ney, 2004). To create phrase alignment, two alignment methods were tried out1: Giza++ and ▇▇▇▇▇ (▇▇▇▇▇ et al., 2007), creating Phrase Tables. From the LT_automotive input data (cf. below), a phrase table with about 7.97 mio entries was built. Phrases as produced with Anymalign (Lardilleux/Lepage 2009). Anymalign created about 3.14 mio word/phrase pairs from the same input data. It soon turned out that if frequency is not considered, too much noise would be in the output. Therefore, frequency (on source and target side) is used and set to > 1. For the translation probability, tests were done to find the optimal recall / precision combination. The two alignment systems were compared, using different values for the translation probability. For evaluation, a random set of term candidates manually inspected2, and the errors in alignment / translation were counted3. The results are given in Table1. ▇▇▇▇▇ p > 0.8 12.000 5.54% ▇▇▇▇▇ 0.6 < p < 0.8 3.900 5.42% ▇▇▇▇▇ 0.4 < p < 0.6 20.000 55.11% AnymAlign p > 0.7 12.600 46.91% 1 Input from PEXACC (▇▇▇ et al., 2011) for comparable corpora is also supported. 2 Entries starting with the letters C, F, and S. 3 There are always unclear cases among translations (e.g. transfers usable only in certain cases); they were not counted as errors. Errors are only clearly wrong translations; however a range of subjectivity remains. AnymAlign p > 0.8 10.900 47.56% It can be seen that the Moses alignment has much better quality, and is in the reach of being usable; AnymAlign error rates are approximately ten times higher. For AnymAlign, taking a higher threshold (0.8 instead of 0.7) does not improve alignment quality. Overall, Moses input with a threshold of 0.6 for P(f|e) seems to give the best results for term extraction, for this size of phrase tables4, with an overall error rate of about 5.5%: it increases recall without reducing precision. It should be noted that alignment errors result from external phrase alignment components, and are just „inherited‟ by the current extraction system. However, they count in the overall workflow evaluation: Incorrect translation proposals lead to significantly higher human reviewing effort.

Appears in 1 contract

Sources: Grant Agreement