Annotation method Clause Samples
Annotation method. The corpus has been annotated by three annotators, but only in some fragments we have followed a double annotation. It is possible to distinguish two methods for the semantic annotation of corpus based on word sense. The first one is linear (or “textual”) method [8], where the human annotator marks the sentences token by token up to the end of the corpus. In this strategy, the annotator must read and analyze the sense of each word every time it appears in the corpus. The second annotation method is transversal (or “lexical”) [8], where he/she annotates word-type by word-type, all the occurrences of each word in the corpus one by one. With this method, the annotator must read and analyze all the senses of a word only once. We have followed in Cast3LB the transversal process (“lexical” method), in which all the occurrences of each word are annotated at the same time by the same annotator. The main advantage of this method is that we can focus our attention on the sense structure of one word and deal with its specific semantic problems: its main sense or senses, its specific senses. Then we check the context of the single word each time it appears in the corpus and select the corresponding sense. Through this approach, semantic features of each word is taken into consideration only once, and the whole corpus achieves greater consistency. Through the linear process (“textual” method), however, the annotator must re- member the sense structure of each word and their specific problems each time the word appears in the corpus, making the annotation process much more complex, and increasing the possibilities of low consistency and disagreement between the annotators. Nevertheless, the transversal or lexical method finds its disadvantage in the an- notation of large corpus, because no fragment of the corpus is available until the whole corpus is completed. To avoid this, we have selected a fragment of the whole corpus and annotated it by means of the linear process. Everybody agrees that semantic annotation is a tedious and difficult task. From a general point of view, the main problems in the semantic annotation based on WordNet senses are: The subjectivity of the human annotator when it comes to the selection of the correct sense: there are usually more than one sense for a word. Due to the WordNet’s granularity [14], more than one sense could be correct for a given word. vagueness of many words. Not all nouns, verbs and adjectives are contained in Spanish WordNe...
