Common use of Normalizer Clause in Contracts

Normalizer. The Normalizer module uses the Apache Tika8 toolkit to parse the structure of each fetched web page and extract its metadata. Extracted metadata are exported at a later stage (see subsection 2.1.8) if the web document is considered relevant for the collection to be constructed. The text encoding of the web page is also detected based on the HTTP Content-Encoding header and the charset part of the Content-Type header, and if needed, the content is converted into UTF-8. Besides default conversion, special care is taken for normalization of specific characters like no break space, narrow no-break space, three-per-em space, etc.

Appears in 2 contracts

Sources: Grant Agreement, Grant Agreement