Common use of Alignment Clause in Contracts

Alignment. For alignment, a standard tool provided in the PANACEA toolbox is Hunalign; it was used for the evaluation. Hunalign produces scores for a given alignment; in PANACEA experiments, it has been proven to be the best strategy to take segments with a score higher than 0.4. If this threshold is used, then results as shown in Table 4-3 are obtained: Europarl 37600 19.399 51.4% LT-SSplit 52.600 28.900 54.9% This shows that only about 50% of the texts can really be used for parallel corpora. The results of LT- SSplit are slightly better (by 3%) than the baseline Europarl results. However, even in documents considered as parallel at first, many segments are not usable for parallel training. To find out how correct the alignment is, 1000 sentence pairs of the resulting corpus have been manually inspected; the results are given in Table 4-4. Europarl 817 81,7% 183 LT-SSplit 866 86,6% 134 The result is that 15 to 20 out of 100 alignments are incorrect, which may negatively influence the creation of SMT resources. Again, LT-SSplit performs slightly better than Europarl. Hunalign is a standard tool used in SMT production; in PANACEA there was no work on alignment planned; the tools were just integrated into the toolbox. However, the result shows that there is room for improvement.

Appears in 2 contracts

Sources: Grant Agreement, Grant Agreement