Annotation Adjustment Clause Samples

Annotation Adjustment. ‌ By examining the annotations we collected, we noticed two things. First, some speakers are identified as a target main speaker because this speaker has many short utterances in conversa- tions such as “Oh”, “Yeah”. According to the Lexical Hypothesis [10], we need enough language input to analyze one’s personality. However, those utterances include very little linguistic cues and can lead to very low agreement. Therefore, we delete sub-scenes whose target speaker has too little language input. By doing so, we have 3448 useful sub-scenes out of 3545 annotations. The statistics about our Friends dataset against the other two datasets can be seen in Table 3.2. Since the inter-rater agreement is low for the task despite the annotators work hard, we decide to add three annotations of each task and obtain a final score for each task between -3 to 3. This way, we can make use of all the annotations. After we draw the distribution of 7 classes for each personality trait, we notice that the percentage of -3 and 3 classes are both very small, around 1 to 2 %. This means it is rare to have strong agreement in the annotations. In a statistical model, classes like -3 and 3 are too small to be ever predicted. In order to make the classes more normally distributed and make use of the two small classes, we decide to combine classes 3 and 2 together, and classes -3 and -2 together. As a result, we reduce the number of classes from 7 to 5. The resulting distributions are more normally distributed.
Annotation Adjustment. 14 3.2 Corpus Analysis . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.1 Annotation Results . . . . . . . . . . . . . . . . . . 15 3.2.2 Challenges of Personality Prediction with Dialogue Text . . . . . . . . . . . . . . . . . . . . . . . . . . 15