Dataset Statistics Sample Clauses

Dataset Statistics. To get an idea of the distribution of atoms in the datasets, the number of unique atoms in the subject, predicate and object positions were computed and tabulated in Table 2.l. Additionally, the number of atoms that could possibly take part in one of the 3 possible joins between different positions were calculated and tabulated in Table 2.2. The abbreviations S, P, and O denote subject, predicate, and object respectively. Dataset Uniq Subj Uniq Pred Uniq Obj DBpedia l36l08 8878 282l99 Uniprot 592639 79 294676 SP2Bench 3l629 6l 8l9l9 Table 2.l: Subject, Predicate, Object statistics Dataset SP SO PO DBpedia 0 48085 0 Uniprot 0 l05560 8 SP2Bench 0 l48l6 0 Table 2.2: Join statistics
Dataset Statistics. The new data set was created based on the 6512 application resume pool from the School of Nursing at Emory University. All the application resumes here applied the specific job, Clinical Research Coordinator, which was divided into four different levels: CRC I, CRC II, CRC III, CRC IV. In addition, for each level of CRC position, it may have multiple different CRC jobs. For example, more than one CRC job may have the same CRC level. Therefore, there are 108 jobs for CRC I, 88 jobs for CRC II, 29 jobs for CRC III and 6 jobs for CRC IV. Out of the 6512 unique resumes, in other words, out of the 6512 applicants, some of them may apply multiple jobs in the same level or across the levels, so there were totally 25027 applications. Due to multiple applications from one applicant, one more necessary cleaning process was to divide applicants into groups by their highest will of application. For example, if one applicant both applied for CRC I and CRC II, he or she should be grouped into CRC II applicant by his or her highest level applied. In this way, the ratio of applicants in the four levels was 28:12:8:2. Then, 2025 resumes were randomly selected to form the dataset: 1134 CRC I applicants’ resumes, 486 CRC II applicants’ resumes, 324 CRC III applicants’ resumes and 81 CRC IV applicants’ resumes. Annotation of these 2025 resumes will be explained more in detail in Section 3.2.