Sentence level evaluation. a. The corpus will be processed by the PANACEA tools for normalisation, sentence segmentation, and tokenisation. For each of the languages, a monolingual test sample of 10 times 50 tokenised sentences will randomly be collected. b. The monolingual parts of the aligned sentences will be manually checked for normalisation, tokenisation and sentence segmentation errors; the errors will be classified according to which component produced them, and an error rate will be computed. The challenge here is to define what such errors are, e.g. what a tokenisation error is. This definition will be done in cooperation with the tool evaluation in WP7. c. The monolingual corpora will undergo sentence-alignment. From the result, a random sample of 10 times 50 aligned segments will be collected, for both language pairs. d. The alignment part is evaluated. The criterion is „alignment precision‟ (▇▇▇▇▇ 2002, following ▇▇▇▇▇ et al. 1991)22, i.e. the number of correct 1-1 sentence alignments. Manual evaluation of the alignments of the test sentences will add a level of correctness to the overall error rate, for the alignment used.
Appears in 2 contracts
Sources: Grant Agreement, Grant Agreement