Corpus Creation Clause Samples

Corpus Creation. ‌ As mentioned before, we will develop our model to fit two types of datasets, which are distinguished by monologue and multiparty dialogue data. Essays dataset is big enough to develop neural networks models on monologue data. As for EAR dataset, it is a corpus that contain only one target speaker’s utterances instead of multiparty utterances for privacy reasons, and it does not have sufficient annotations to our task. Therefore, it is both novel and necessary for us to create a new corpus for the task. Our new Friends corpus is published and publicly available online. This work also introduces a systematic framework for annotating personality traits in order to get a large scale dataset for personality prediction.