Nikita Bhutani, Aaron Taylor, Chen Chen, Xiaolan Wang, Behzad Golshan, Wang-Chiew Tan
Knowledge bases (KBs) have long been the backbone of many real-world applications and
services. There are many KB construction (KBC) methods that can extract factual information,
where relationships between entities are explicitly stated in text. However, they cannot model
implications between opinions which are abundant in user-generated text such as reviews and often
have to be mined. Our goal is to develop a technique to build KBs that can capture both opinions and
their implications. Since it can be expensive to obtain training data to learn to extract implications
for each new domain of reviews, we propose an unsupervised KBC system, SAMPO, that is based
on matrix factorization techniques. Specifically, SAMPO is tailored to build KBs for domains where
many reviews on the same domain are available. We generate KBs for 20 different domains using
SAMPO and manually evaluate KBs for 6 domains. Our experiments show that KBs generated
using SAMPO capture information otherwise missed by other KBC methods. Specifically, we show
that our KBs can provide additional training data to fine-tune language models that are used for
downstream tasks such as review comprehension.