Nikita Bhutani, Yoshihiko Suhara, Wang-Chiew Tan, Alon Halevy, H. V. Jagadish
Open Information Extraction (OPENIE) extracts meaningful structured tuples from freeform text. Most previous work on OPENIE
considers extracting data from one sentence at
a time. We describe NEURON, a system for
extracting tuples from question-answer pairs.
Since real questions and answers often contain precisely the information that users care
about, such information is particularly desirable to extend a knowledge base with.
NEURON addresses several challenges. First,
an answer text is often hard to understand
without knowing the question, and second, relevant information can span multiple sentences.
To address these, NEURON formulates extraction as a multi-source sequence-to-sequence
learning task, wherein it combines distributed
representations of a question and an answer to
generate knowledge facts. We describe experiments on two real-world datasets that demonstrate that NEURON can find a significant number of new and interesting facts to extend a
knowledge base compared to state-of-the-art
OPENIE methods.