Experiment Settings Clause Samples
Experiment Settings. All experiments began with an empty model, without prior training or predefined constraints. We performed tests on extracting personal information such as age and gender, and most frequently queried medical information such as diagnosis, genetic marker and therapy/procedure (See table 6.5 for details). To support extraction, we employ a seed vocabulary consisting of diagnosis, genetic marker (both gene and protein) and procedure lexicon, which are loaded from the Human Disease Ontology [85], the Cell Cycle Ontology [86] and the NCI Thesaurus [80] Ontology respectively.
Experiment Settings. We aim to evaluate the effectiveness of the system with respect to using online learning and controlled vocabularies, and to understand their applicability to different report forms. By analyzing the report styles and vocabularies, we discover that online learning is more suited for semi-structured or template based narration reports, and controlled vocabulary guided data extraction would be more effective on complex narration with a finite vocabulary. Thus, we design three experiments:
1) Online learning based data extraction, where controlled vocabularies are not provided, based on Dataset 1 (semi-structured) and Dataset 2 (template based narration);
2) Controlled vocabularies based data extraction, where online learning is not used, based on Dataset 3 (complex narration);
3) Controlled vocabularies guided data extraction combined with online learning, based on Dataset 3.
