Annotation Process Sample Clauses
Annotation Process. To carry out the annotation work, we recruited five Dutch archaeology students at the Bachelor level. We specifically selected students in their second and third year, as some basic knowledge of archaeology is extremely helpful in determining whether a word is a specific entity or not. The students were asked to annotate a total of 16 hours each, over a two week period, during which they could come and work at times that suited them, a few hours at a time. We opted not to have the students work a whole day on this task, as the annotation process is tedious and monotonous, which makes it hard to keep concentration. Loss in concentration can cause mislabelling, and so having them work for only small amounts of time might help prevent this. The students were first asked to thoughtfully read the guidelines and ask any questions. During annotation, we were always present to resolve difficult sentences and entities and explain to the students how to handle these. The students reported this to be very helpful, and learned from each other’s prob- lems. Most of these issues were relatively rare edge case though, and the original annotation guidelines covered most encountered entities sufficiently.
Annotation Process. Our annotation task consisted of annotating persua- sion techniques in a corpus consisting of circa 1600 news articles revolving around various globally dis- cussed topics in six languages: English, French, German, Italian, Polish, and Russian, using the taxonomy introduced earlier. A balanced mix of mainstream media and “alternative” media sources that could potentially spread mis/disinformation were considered for the sake of creating the dataset. Furthermore, sources with different political orien- tation were covered as well. The pool of annotators consisted of circa 40 per- sons, all native or near-native speakers of the lan- guage they annotated. Most of the annotators were either media analysts or researchers and experts in (computational) linguistics, where approximately 80% of the annotators had prior experience in per- forming linguistic annotations of news-like texts. A thorough training was provided to all annotators which consisted of: (a) reading a 60-page anno- tation guidelines (▇▇▇▇▇▇▇▇▇ et al., 2023a) — an excerpt thereof is provided in Appendix C), (b) participating in online multi-choice question-like training, (c) carrying out pilot annotations on ▇▇▇- ple documents, and (d) joint sharing experience with other annotators and discussions with the or- ganisers of the annotation task. Subsequently, each document was annotated by at least two annotators independently. On a weekly basis reports were sent to annotator pairs highlighting complementary and potentially conflicting annotations in order to con- verge to a common understanding of the task, and regular meetings were held with all annotators to align and to discuss specific annotation cases. Annotations were curated in two steps. In the first step (document-level curation) the independent annotations were jointly discussed by the annota- tors and a curator, where the latter was a more experienced annotator, whose role was to ▇▇▇▇▇▇- ▇▇▇▇ making a decision about the final annotations, including: (a) merging the complementary anno- tations (tagged only by one annotator), and (b) re- solving the identified potential label conflicts. In the second step (corpus-level curation) a global consistency analysis was carried out. The rationale behind this second step was to identify inconsisten- cies that are difficult to spot using single-document annotation view and do comparison at corpus level, e.g., comparing whether identical or near-identical text snippets were tagged with the same or a ...
Annotation Process. Prior to the annotation process, the annotators were trained with the annotation guideline. The pursued annotation process of the TDB can be explained in the following four steps (▇▇▇▇▇▇ et al., 2009):
1- Annotations were performed for a particular connective by three or two annotators. The annotators performed a particular discourse connective and its arguments’ annotations file by file for the whole corpus files. In this step each annotator worked individually.
2- Afterwards, individual annotations were compared. Then, the disagreements found were discussed, and solved by the project group.
3- Annotation guideline was revised according to the discussions.
4- The agreed annotations were checked if they completely obey the annotation guideline. The above annotation process was cycled for all discourse connectives. However, in later phases of the annotation effort, the TDB group decided that the inter-annotator reliability has stabilized, and they switched to a more rapid annotation strategy. In the new strategy, the TDB group kept the annotation processes same, except the annotators. According to the new strategy, a pair of annotators and an individual annotator (practically two annotator teams) performed the annotations (▇▇▇▇▇▇▇▇▇▇, ▇▇▇▇▇▇▇▇▇▇, & ▇▇▇▇▇▇, 2010). In the process defined above, the inter-annotator reliability shall be measured right after the first step because by the second step, annotator’s individual decisions are judged and corrected. Thanks to the version control software that the TDB group uses, the annotation data, which were produced at each step of the annotation process, can be retrieved easily.
Annotation Process. Before ▇▇▇▇ et al. (2003), ▇▇▇▇▇, Amorrortu, & ▇▇▇▇▇▇ (1999) performed a RST corpus annotation which led to the development of an annotation protocol that ▇▇▇▇ et al. (2003) followed. First this work will be summarized.
