Text2KB: Augmenting Knowledge Base Question Answering with External Text Data Clause Samples
Text2KB: Augmenting Knowledge Base Question Answering with External Text Data. Existing relation extraction tools are not perfect, in particular, due to recall losses a lot of information is left behind. Moreover, extractions contain a certain level of incorrect information due to precision losses. Therefore, by applying relation extraction we are lowering the upper bound of the perfor- ▇▇▇▇▇ of an underlying question answering system. An alternative approach is to keep the information in its raw unstructured format and design a way to use it along with KB. In this section, I describe a novel factoid question answering system, that utilizes available textual resources to improve differ- ent stages of knowledge base question answering (KBQA). This work was presented as a full paper at SIGIR 2016 conference [166]. KBQA systems must address three challenges, namely, question entity iden- tification (to anchor the query process); candidate answer generation; and candidate ranking. We will show that these challenges can be alleviated by the appropriate use of external textual data. Entity identification seeds the answer search process, and therefore the performance of the whole system greatly depends on this stage [234]. The question text is often quite short, may contain typos and other problems, that complicate entity linking. Exist- ing approaches are usually based on dictionaries that contain entity names, aliases and some other phrases, used to refer to the entities [182]. These dictionaries are noisy and incomplete, e.g., to answer the question “what year did tut became king?” a system needs to detect a mention “tut”, which refers to the entity Tutankhamun. If a dictionary does not contain a mapping “tut” → Tutankhamun, as happens for one of the state-of-the-art systems, it will not be able to answer the question correctly. Such less popular name variations are often used along with full names inside text documents, for example, to avoid repetitions. Therefore, we propose to look into web search results to find variations of question entity names, which can be easier to link to a KB (Figure 3.3). This idea has been shown effective in entity linking for web search queries8 [57]. After question entities have been identified, answer candidates need to be generated and ranked to select the best answer. A candidate query includes one or multiple triple patterns with predicates, corresponding to words and phrases in the question. Existing knowledge base question answering ap- proaches [20, 22, 23, 24, 35, 235] rely on a lexicon, learned f...
