Common use of XXXX Preprocess and XXXX Tokenizer Clause in Contracts

XXXX Preprocess and XXXX Tokenizer. The XXXX Preprocess and the XXXX Tokenizer services provide preprocessing functionalities for Spanish. XXXX Preprocess segments text into minor structural units (titles, paragraphs, sentences, etc.); detects entities usually not found in dictionaries (numbers, abbreviations, URLs, emails, proper nouns, etc.); and makes sure that sequences of two or more words (in dates, phrases, proper nouns, etc.) are kept together in a single block. The XXXX Tokenizer service delivers the same results vertically tokenized, one word per line. The two services accept input and output encoded in UTF-8 or ISO-8859-1/-15. Both services employ the XXXX Processing Tool (IPT), developed by Xxxxxxxx and Xxxxxxx (2010). IPT is based on rules that depend on a series of resources to improve obtained results: a grammatical phrase list, a foreign expression list, a follow-up abbreviation list, a word-form lexical database (which is also used by the XXXX POS-tagger described in the following subsection), and a stop-list to increase lexical-lookup efficiency. IPT has been evaluated against a hand-tagged corpus used as a Gold Standard, divided in two domain specific topics (Press and Genomics). Accuracies of 99.39% and 91.55% are reported by Xxxxxxxx et al. (2010) for sentence splitting in the two collections. Respective results for NER are 95.43% and 99.76%. Web form xxxx://xxxxxxxx.xxx.xxx/soaplab2- axis/#chunking_segmentation.iula_preprocess_row, xxxx://xxxxxxxx.xxx.xxx/soaplab2- axis/#tokenization.iula_tokenizer_row WSDL xxxx://xxxxxxxx.xxx.xxx/soaplab2- axis/services/chunking_segmentation.iula_preprocess?wsdl , xxxx://xxxxxxxx.xxx.xxx/soaplab2- axis/services/tokenization.iula_tokenizer?wsdl PANACEA Catalogue Entry xxxx://xxxxxxxx.xxxx.org/services/124, xxxx://xxxxxxxx.xxxx.org/services/119 Table 5 WS Details for XXXX Preprocess and XXXX Tokenizer

Appears in 2 contracts

Samples: repositori.upf.edu, cordis.europa.eu

AutoNDA by SimpleDocs

XXXX Preprocess and XXXX Tokenizer. The XXXX Preprocess and the XXXX Tokenizer services provide preprocessing functionalities for Spanish. XXXX Preprocess segments text into minor structural units (titles, paragraphs, sentences, etc.); detects entities usually not found in dictionaries (numbers, abbreviations, URLs, emails, proper nouns, etc.); and makes sure that sequences of two or more words (in dates, phrases, proper nouns, etc.) are kept together in a single block. The XXXX Tokenizer service delivers the same results vertically tokenized, one word per line. The two services accept input and output encoded in UTF-8 or ISO-8859-1/-15. Both services employ the XXXX Processing Tool (IPT), developed by Xxxxxxxx and Xxxxxxx (2010). IPT is based on rules that depend on a series of resources to improve obtained results: a grammatical phrase list, a foreign expression list, a follow-up abbreviation list, a word-form lexical database (which is also used by the XXXX POS-tagger described in the following subsection), and a stop-list to increase lexical-lookup efficiency. IPT has been evaluated against a hand-tagged corpus used as a Gold Standard, divided in two domain specific topics (Press and Genomics). Accuracies of 99.39% and 91.55% are reported by Xxxxxxxx et al. (2010) for sentence splitting in the two collections. Respective results for NER are 95.43% and 99.76%. Web form xxxx://xxxxxxxx.xxx.xxx/soaplab2- axis/#chunking_segmentation.iula_preprocess_row, xxxx://xxxxxxxx.xxx.xxx/soaplab2- axis/#tokenization.iula_tokenizer_row WSDL xxxx://xxxxxxxx.xxx.xxx/soaplab2- axis/services/chunking_segmentation.iula_preprocess?wsdl , xxxx://xxxxxxxx.xxx.xxx/soaplab2- axis/services/tokenization.iula_tokenizer?wsdl PANACEA Catalogue Entry xxxx://xxxxxxxx.xxxx.org/services/124, xxxx://xxxxxxxx.xxxx.org/services/119 Table 5 6 WS Details for XXXX Preprocess and XXXX Tokenizer

Appears in 2 contracts

Samples: cordis.europa.eu, www.panacea-lr.eu

AutoNDA by SimpleDocs
Time is Money Join Law Insider Premium to draft better contracts faster.