Reviews heavily influence customer purchasing decisions. But an overwhelming abundance of reviews makes it difficult to understand the prevalent public opinions of a product or service. Existing opinion mining and review summarization techniques unfortunately suffer from a number of limitations. In particular, they may produce duplications and ignore relationships between opinions. To address them, we developed ExplainIt, a review summarization system that extracts and organizes opinions into an opinion graph. In this blog post, we will discuss how we represent subjective information from reviews with Opinion Graphs. We’ll also explore how ExplainIt’s novel pipeline facilitates Opinion Graph construction from reviews. To the best of our knowledge, ExplainIt is the first pipeline that can extract and organize both opinions and their explanation relationships from reviews.
Objective Data vs. Subjective Data
The internet is a major resource of both factual and subjective information. For example, Wikipedia (Figure 1a) and similar websites contain colossal quantities of factual or objective information. On the other hand, online service providers (Figure 1b) like TripAdvisor, Amazon, or Indeed hold a substantial amount of subjective information from customer reviews.
There has been significant progress in extracting facts in the form of subject-predicate-object triples (e.g., Mars, member of, Solar System)) and constructing knowledge bases from this data. These advancements are instrumental for a wide variety of technology applications such as search engines and question-answering (QA) systems. But much less machine learning and natural language processing research has been conducted on organizing opinions into a structured format. Not only are opinions abundant in subjective data, but they are integral to understanding nuances in sentiment, which is a key component of activities like market or product research.
To fill this gap, we created ExplainIt. At the heart of this review summarization system is an Opinion Graph– a representation that can help immensely with downstream applications like generating explainable review summaries and searching over opinion phrases. Let’s discuss why a better way of organizing opinions is needed by examining the limitations of current opinion mining techniques.
Addressing Opinion Mining Limitations
Existing opinion mining research efforts largely focus on two factors:
Improving opinion extraction accuracy.
Aspect-based sentiment analysis of extracted opinions over a set of predefined aspects.
However, such techniques cannot be directly used to organize extracted opinions due to two limitations:
1. No Relationship Between Opinions
Current opinion mining techniques cannot be used to determine the relationships between opinions. They can determine the sentiment of an extracted opinion like “good location.” But they cannot explain, for example that the location is good because of proximity to the park, or park, or nightlife.
2. Too Many Redundancies
Simply collecting all extracted opinions leads to too much redundancy and could also cause incorrect conclusions. For example, pretend we have a list of the following extracted opinions: “quiet room”, “very noisy street”, “loud neighborhood”, “horrible city noise”, and “quiet room”. We may erroneously conclude that “quiet room” is the most popular opinion if the opinions are not organized according to similarity.
So, what’s the best way to organize opinions into a knowledge base? To answer this, we analyzed the properties of subjective information in reviews through a series of annotation tasks. This allowed us to confirm the following:
Opinion Phrases Are the Most Common Representation of Subjective Information
Opinion phrases are pairs of the form (opinion term, aspect term). An example is (very good, location). These types of pairs account for 84.75% of subjective information.
Explanation Is the Most Common Relationship Between Opinions
Among 40,000 randomly sampled review sentences from TripAdvisor, 12.3% of the expressed opinions are correlated under some relationship. In other words, these opinions explain, contradict, or paraphrase each other. Among these opinions, 74.2% are related under the explanation relationship.
Opinions and Explanations Are Entity-Specific
Many opinions and their explanations are specific to the reviewed entities, and do not reflect a general explanation relationship across multiple entities. For instance, the opinion “close to main street” explains “very noisy room” for a specific hotel in the review, “Our room was very noisy because it is close to the main street.” But this explanation may not be true for arbitrary hotels.
Introducing the Opinion Graph
Based on our analysis, we propose a graph representation for organizing opinions that we simply call an Opinion Graph. This organizes opinions around the explanation relationship based on reviews specific to an entity. As shown in Figure 2c below, each Opinion Graph node is a set of opinion phrases that are semantically similar to each other. An edge (u, v) between two nodes u and v denotes that u explains v.
We found Opinion Graphs to be a versatile structure for review opinion organization due to several reasons:
- The Opinion Graph is a concise, structured representation of opinions over numerous reviews.
- The nodes can aggregate and represent opinions at different levels of granularity.
- The edges explain opinions based on other opinions that appear in the reviews.
- Opinion provenance in nodes can be traced back to the input reviews they are extracted from.
- The Opinion Graph is a useful abstraction that can support several downstream applications like explainable review summary generation and opinion phrase searching.
Opinion Graph Construction
To build Opinion Graphs, we developed ExplainIt. This pipeline is inspired by methods used in knowledge base construction. We’ve broken down the process of Opinion Graph construction into four components, as illustrated in Figure 3 below.
Step 1: Opinion Mining
In the first step of our pipeline, we mine opinion phrases from a set of reviews about an entity. To accomplish this, we can leverage existing aspect-based sentiment analysis models. In our pipeline, we specifically use Snippext, an opinion mining system that extracts opinion phrases from reviews. It also predicts the aspect category and sentiment associated with each extraction. We exploit these additional signals to improve opinion phrase canonicalization.
Step 2: Explanation Mining
Next, ExplainIt discovers any explanation relationships between pairs of opinion phrases extracted from reviews. We utilized crowdsourcing to obtain domain-specific labeled data and developed a classifier to discover the explanation relationship between opinion phrases. The key concept is to use two different tasks for training a single model to build a better and more robust classifier. This technique is known as multi-task learning. As shown in Figure 4 below, our model contains two classification tasks:
- Review Classification: Elucidate whether the review contains explanations.
- Explanation Classification: Clarify whether the first opinion phrase explains the second one.
We want our multi-task classification model to intuitively capture signals from the context and the opinion phrases. To achieve this, our model accounts for both the context surrounding the opinion phrases and the word-by-word alignments between opinion phrases. It is a departure from prior methods in open-domain textual entailment and entity relation classification; these do not consider both types of information at the same time.
Step 3: Opinion Phrase Canonicalization
After we mine explanations between opinion phrases, we group semantically similar opinion phrases together (e.g., not far away from Fisherman’s Wharf” and “close to the wharf”) to form a node in the Opinion Graph. This step is necessary because reviews overlap significantly in content and, thus, contain many similar opinion phrases.
To canonicalize opinion phrases, we developed a novel opinion phrase representation learning framework that learns opinion phrase embedding. This framework has two key properties:
Conventional embedding techniques usually prepare a single embedding vector for each opinion phrase. In contrast, this framework uses different embedding for opinion and aspect terms separately. It then merges them into an opinion phrase embedding. This helps distinguish lexically similar but semantically different opinion phrases (e.g., “very close to the trams” vs. “very close to the ocean”).
This framework uses weak supervision to incorporate additional signals that help capture the semantic meaning of opinion phrases in the opinion phrase embeddings without additional human annotation cost. In particular, it includes additional objectives in the loss function to ensure the learned representations are good for predicting aspect category, sentiment label, and the explanation relationships.
With the additional loss functions based on signals extracted in the previous steps, our model can incorporate sentiment information into opinion phrase embeddings while retaining the explanation relationship between opinion phrases. in the embedding space.
After opinion phrase representation learning, we apply a clustering algorithm over the learned opinion phrase embeddings to obtain opinion clusters. Each opinion cluster is a node (i.e., canonicalized opinion) of the final Opinion Graph.
Step 4: Opinion Graph Generation
For the fourth and final step, we apply an intuitive algorithm to construct the final Opinion Graph. This algorithm does this by connecting graph nodes according to the aggregated explanation predictions between opinion phrases in the respective nodes.
We conducted three types of experiments to evaluate ExplainIt. Let’s delve into each one:
We performed automatic evaluation over an annotated dataset with 7,400 annotated examples and compared ExplainIt with three groups of alternative approaches. The first group consists of three different existing models for recognizing textual entailment (RTE). The second group comprises two models for relation classification. And the third group was used for an ablation study in which we disabled the review classification objective to test how much the multi-task learning framework improves overall performance.
As shown in Table 1 above, our proposed model (PROPOSED) is a substantial improvement over baseline approaches. We significantly outperform non-pre-trained RTE models and sentence classification models (RelCLS) by 5.94% to 7.42%. Models that consider contextual information tend to perform better in explanation mining. Leveraging a pre-trained model can further improve performance. For example, replacing the embedding layer (GloVe) with BERT increased the accuracy by 4%.
Canonicalizing Opinion Phrases
We also evaluated the learned opinion phrase representation over two annotated datasets in two domains: Hotel and Restaurant. To understand the benefits of our learned opinion phrase embeddings (WS-OPE), we evaluated whether WS-OPE can consistently improve the performance of existing clustering algorithms (k-means, GMM, and Correlation Clustering).
We used three metrics( homogeneity, completeness, and V-measure) in the same manner as precision, recall, and F1-score to evaluate performance. As shown in Table 2 above, our learned opinion phrase representations (WS-OPE) achieved the best performance in all settings. It also consistently boosted the performance of existing clustering algorithms compared to the baseline methods (AvgWE and ABAE) in both hotel and restaurant domains.
In addition to these findings, we confirmed that our model noticeably benefits from the weak supervision found in the previous opinion and explanation mining steps of our pipeline. It significantly outperforms ABAE, which does not utilize weak supervision.
Opinion Graph Quality
Besides the automatic evaluation of the explanation mining and opinion phrase canonicalizing modules, we conducted a user study to verify the quality of ExplainIt’s Opinion Graphs. More specifically, we randomly sampled nodes from the graph and asked human annotators to judge each node’s quality as well as the accuracy of the explanation relationship between them. Figure 6 below depicts an example question.
We generated examples (i.e., constructed Opinion Graphs) for 10 hotels. This amounted to 166 node pairs (or questions) in total. Every question was shown to three judges. They agreed with our predicted relation in more than 77% of cases.
A Better Way to Represent Subjective Data from Reviews
We hope you’ve enjoyed this article about ExplainIt. With a unique approach to extracting and organizing the relationships between opinions, we believe this system has vast potential to improve how we understand subjective information and the nuances of public sentiment. In turn, this can transform how we approach tasks like market research and product development.
We’ve already expanded upon our work on ExplainIt by building ExtremeReader, a summary explorer that visualizes generated Opinion Graphs. ExplainIt also supports OpinionDigest, our abstractive summarization system that produces textual summaries.
Stay tuned for updates! We’ll be sure to share any exciting research developments for ExplainIt through our blog. In the meantime, we have created labeled datasets for explanation mining and opinion phrase canonicalization tasks. These are publicly available for future machine learning and natural language processing research. You can find them here.