Corpus Collection Sample Clauses
Corpus Collection. The corpus used for the LT-Xfr experiments consists of parallel sentences collected from different domains; details are given in Tab. 3-1: automotive 47,485 dgt 530,760 europarl 1,739,154 health&safety 57,155 jrc-acquis 1,239,731 e-books 82,635 statmt_dev 15,134 statmt_news 136,227 total 3,848,281 Overall, 3.8 mio parallel sentences German-English were used for the experiment.
Corpus Collection. Two bilingual corpora will be collected, using the parallel web crawler. They should be in a special domain, like software manuals. The size of the corpus should be such that it can support the requirements of the different extraction tools, in order to allow them to show their full capacity. The languages will be determined depending on the progress of the different PANACEA tools, but two different language pairs will be involved.
