Task Feasibility Sample Clauses

Task Feasibility. ‌ The task of automatic personality prediction on monologue text dataset has shown to be a feasible task by previous studies. However, nobody has shown that this task is also feasible on dialogue text corpus. In order to evaluate the feasibility of the task, we will develop the state- of-art models and see how the models perform on our new corpus. It will not be surprising if the models perform worse on the dialogue corpus because our corpus is more challenging than previous dialogue datasets and even human beings show little consensus. If the state-of-art models are not successful on the new corpus, we should start designing new models to adapt to the different structure of dialogue data. After extracting LIWC features using ▇▇▇▇▇▇▇▇▇▇’▇ LIWC tool, we feed the dataset into Weka to build the two best-performing classification models (SMO, and simpleLogistic) men- tioned in the paper [1] without any feature reduction. Table 5.1: Comparison between LIWC features and word embeddings. Multilayer Perceptron, the baseline neural network, is in both cases. FastText is used to train our word embeddings on Friends and other large-scale datasts. 5.1.1 LIWC vs word embeddings‌ The large number of misspellings in both datasets pose a serious challenge to the appli- cation of pre-trained word vectors because they are unlikely to appear in the pre-trained word embeddings [2]. However, this problem can be solved by utilizing a character-level word embed- dings because it is able to compose similar word embeddings for those misspelled or irregular words as their corresponding standard spelling. Specifically, we use fastText [36] n-character embeddings trained on a dataset which combines New York Times corpus, the Wikipedia text dump, the Amazon Book Reviews, and the transcripts from several TV shows including Friends in our paper. To see whether word embeddings also contain the linguistic cues necessary to the task of automatic personality prediction, we design a small experiment on the essays dataset. Specif- ically, we feed the essays dataset into the same MLP model using LIWC features and word embeddings respectively. The results (Table 5.1) below demonstrate that the same model has a better accuracy in 3 out of 5 personality traits on the same dataset. As a result, we are able to confirm that word embeddings are effective linguistic features for automatic personality prediction. In the following experiments, we can keep using pre-trained word vectors for our task. 5.1....