Common use of Inter Annotator Agreement Clause in Contracts

Inter Annotator Agreement. For most tasks, ▇▇▇▇▇’▇ Kappa is reported as a measure of IAA, and is consid- ered the standard measure (▇▇▇▇▇▇, 2012). But for Named Entity Recognition, Kappa is not the most relevant measure, as noted in multiple studies (▇▇▇▇▇▇▇▇ & ▇▇▇▇▇▇▇▇▇▇, 2005; ▇▇▇▇▇▇ et al., 2011). This is because Kappa needs the num- ber of negative cases, which isn’t known for named entities. There is no known number of items to consider when annotating entities, as they are a sequence of tokens. A solution is to calculate the Kappa on the token level, but this has two associated problems. Firstly, annotators do not annotate words individually, but look at sequences of one or more tokens, so this method does not reflect the annotation task very well. Secondly, the data is extremely unbalanced, with the un-annotated tokens (labelled "O") vastly outnumbering the actual entities, un- fairly increasing the Kappa score. A solution is to only calculate the Kappa for tokens where at least one annotator has made an annotation, but this tends to underestimate the IAA. Because of these issues, the pairwise F1 score calculated without the O label is usually seen as a better measure for IAA in Named Entity ▇▇▇▇▇’▇ Kappa on all tokens 0.82 ▇▇▇▇▇’▇ Kappa on annotated tokens only 0.67 F1 score 0.95 Table 3.4: Inter-annotator agreement measures on 100 sentence test document. Calculated by doing pairwise comparisons between all combinations of annotators and averaging the results. Recognition (▇▇▇▇▇▇▇ et al., 2012). However, as the token level Kappa scores can also provide some insight, we provide all three measures but focus on the F1 score. The scores are provided in Table 3.4. These scores are calculated by averaging the results of pairwise comparisons across all annotators. We also cal- culated these scores by comparing all the annotators against the annotations we did ourselves, and obtained the same F1 score and slightly lower Kappa (-0.02).

Appears in 1 contract

Sources: License Agreement

Inter Annotator Agreement. For most tasks, ▇▇▇▇▇’▇ Kappa is reported as a measure of IAA, and is consid- ered the standard measure (▇▇▇▇▇▇, 2012). But for Named Entity Recognition, Kappa is not the most relevant measure, as noted in multiple studies (▇▇▇▇▇▇▇▇ & ▇▇▇▇▇▇▇▇▇▇, 2005; ▇▇▇▇▇▇ et al., 2011). This is because Kappa needs the num- ber of negative cases, which isn’t known for named entities. There is no known number of items to consider when annotating entities, as they are a sequence of tokens. A solution is to calculate the Kappa on the token level, but this has two associated problems. Firstly, annotators do not annotate words individually, but look at sequences of one or more tokens, so this method does not reflect the annotation task very well. Secondly, the data is extremely unbalanced, with the un-annotated tokens (labelled "O") vastly outnumbering the actual entities, un- fairly increasing the Kappa score. A solution is to only calculate the Kappa for tokens where at least one annotator has made an annotation, but this tends to underestimate the IAA. Because of these issues, the pairwise F1 score calculated without the O label is usually seen as a better measure for IAA in Named Entity 42 CHAPTER 3. DATA SET ▇▇▇▇▇’▇ Kappa on all tokens 0.82 ▇▇▇▇▇’▇ Kappa on annotated tokens only 0.67 F1 score 0.95 Table 3.4: Inter-annotator agreement measures on 100 sentence test document. Calculated by doing pairwise comparisons between all combinations of annotators and averaging the results. Recognition (▇▇▇▇▇▇▇ et al., 2012). However, as the token level Kappa scores can also provide some insight, we provide all three measures but focus on the F1 score. The scores are provided in Table 3.4. These scores are calculated by averaging the results of pairwise comparisons across all annotators. We also cal- culated these scores by comparing all the annotators against the annotations we did ourselves, and obtained the same F1 score and slightly lower Kappa (-0.02).

Appears in 1 contract

Sources: Doctoral Thesis