Common use of Topic Classifier Clause in Contracts

Topic Classifier. The aim of this module is to identify if a page that is normalized and in the targeted language contains data relevant to the targeted domain. To this end, the content of the page is compared to the domain definition provided by the user (see parameter termList in subsection 2.1.10), following a string-matching method adopted by the Combine web crawler11. A naive stemmer included in the org.apache.lucene library is used to stem user-provided terms and document content. Based on the number of terms’ occurrences, their location in the web page (i.e. in the title, keywords, and/or body) and the weights of found terms, a page relevance score p is calculated as follows:

Appears in 2 contracts

Sources: Grant Agreement, Grant Agreement