With the rapid growth of online review platforms such as Yelp and Glassdoor, people are now relying on customer reviews to make decisions on everything from dining to job search. One study shows that over 94% of online customers read reviews before making decisions. However, massive amounts of reviews are posted on the platforms by customers every day, which makes it difficult to find the useful opinions that you are looking for. Opinion summarization is a solution for those situations.
The opinion summarization system extracts representative opinions from reviews and makes a summary of the opinions in a concise and easy-to-understand form. This allows users to make decisions without reading many reviews.
Normally, a text summarization system is built by training a neural network model using a large number of human-written summaries. This is difficult for opinion summarization systems as it is expensive and infeasible to collect sufficient amounts of human-written summaries for customer reviews that contain a wide variety of opinions. Consequently, the primary research focus of opinion summarization is to develop an opinion summarization model using the unsupervised approach that does not require human-written summaries.
The previous blog articles presented a series of our efforts on opinion summarization, including OpinionDigest, a controllable opinion summarization system, ExtremeReader, an interactive explorer for customizable review summarization, and Snippext, a powerful aspect-based opinion extractor which enhances both of the summarization systems.
In this blog post, we’d like to introduce a new summarization framework, Coop, which helps unsupervised opinion summarization generate more specific summaries from customer reviews.
We found that existing unsupervised opinion summarization systems often generate overly generic summaries that are entity agnostic (e.g., not specific about a certain restaurant or product). This is not preferable as the user wants specific information about each restaurant/product from the generated summary. Thus, we developed Coop to improve the unsupervised opinion summarization model. By generating more specific summary information, Coop’s applicability broadens.
Unsupervised Opinion Summarization
Before diving into the Coop framework, let us explain the standard way to build an unsupervised opinion summarization system and the problem that leads to generating overly generic summaries.
Learning to Reconstruct and Summary Generation
A common approach for unsupervised opinion summarization is to build an encoder-decoder model with the reconstruction objective. As shown in Fig 1, the model learns to embed review text into a vector representation in the latent space, which is then decoded back to the original review text. By learning to reconstruct a massive number of reviews, the model should learn to encode semantically similar reviews into similar vectors in the latent space.
For summarizing multiple reviews, the model separately encodes input reviews into latent vectors and aggregates the latent vectors into a summary vector by taking the simple average. By doing so, we expect that the summary vector will be decoded into a fluent summary that contains representative opinions in the input reviews.
Summary Vector Degeneration
The simple average is a natural and intuitive way to calculate a summary vector in the latent space; it has been used as the de-facto standard for existing unsupervised opinion summarization models.
However, we found that summaries generated from the simple average summary vector tend to become too generic. Figure 2 shows the latent vectors of two sets of three reviews about different restaurants. Although the original latent vectors are significantly different, the simple average summary vectors get closer to the center in the latent space. Thus, the summary vectors are decoded into semantically similar and too generic summaries, which can apply to any restaurant. We refer to this problem as summary vector degeneration. More than other issues, summary vector degeneration prevents opinion summarization models from producing specific summaries (i.e., summaries with representative and distinctive opinions).
Finding Better Summary Vectors with Coop
To address this issue, we introduce Coop, a latent vector aggregation framework that helps opinion summarization models generate more specific summaries. Solving the summary vector degeneration problem is not straightforward. For instance, we have tried re-scaling the summary vector by multiplying by a constant value to make the summary vector more distinctive. This idea helped a little bit but ended up generating “hallucinated” content (i.e., information not in input reviews). Thus, it is essential and challenging to strike a balance between specificity and content support. To this end, we formulate the unsupervised opinion summarization task as an optimization problem to find a better summary vector that generates summaries more aligned with input reviews.
To formulate an optimization problem, we need to define (1) the objective function (i.e., what criteria to maximize or minimize) and (2) the search space (i.e., which candidates to search for). We will describe each point below.
Objective: Input-Output Word Overlap
One fundamental issue in the simple average aggregation method is that the summary vector calculation is decoder agnostic. The simple average aggregation method does not take into account the decoder behavior, although it is the decoder that generates a summary from a summary vector. This leads us to consider a criterion that evaluates the alignment between a generated summary and the input reviews. Specifically, we choose word-level overlap between a generated summary and input reviews (which we refer to as input-output word overlap) as the objective to maximize.
Figure 4 shows an example of a generated summary and input reviews. We assume that a summary with higher input-output word overlap contains more input-review opinions. This also helps the decoder avoid generating hallucinated content to create more factual summaries. Furthermore, we can measure the input-output overlap without relying on manually written reference summaries, which enables Coop to explore summary vectors in a fully unsupervised fashion.
Search Space: Powerset of Reviews
With the input-output word overlap as the objective, the problem is now reduced to finding a summary vector that maximizes this criterion. Through extensive exploration, we find that simplifying the search space to either include or exclude an input review for the summary vector aggregation, which we call the Powerset, can produce better summary vectors more efficiently. Thus, we narrow down the search space to Powerset to find the summary vector that maximizes the input-output word overlap. Intuitively, Coop decides to use (or not) each of the input reviews and takes the average of the latent vectors of the selected reviews to make a summary vector.
Generating More Specific Summaries by Coop
Now you may wonder how good the summaries are that Coop generates. Let’s first take a look at the generated summaries for a Chinese restaurant and an American restaurant. As shown in the table below, when using the conventional simple average aggregation strategy, the two summaries convey almost identical information. Conversely, when using Coop, the summaries contain more entity-specific information. For instance, the summary for the Chinese restaurant talks about the cuisine type and portion of foods, while the one for the American restaurant includes a variety of dishes that are served by this restaurant!
The above examples show that Coop can produce more specific summaries than the simple average method. To further demonstrate this, we ask human annotators to manually judge the quality of generated summaries. More specifically, we ask them to judge the quality of the summaries with respect to two criteria:
- Informativeness: how much useful information a summary contains, ranges from -100 (worst) and +100 (best);
- Content support: whether the summary aligns with the input reviews (i.e., degree of hallucination), ratios for sentences that are fully, partially, or not supported by the input review.
As shown in Table 1 below, summaries generated by Coop are more informative than the simple average. Meanwhile, Coop also behaves well on content support as it generates more sentences with full/partial content support than the other methods. These results indicate that Coop is able to generate more specific and informative summaries that are well supported by the input reviews.
Finally, we evaluated the performance of Coop on publicly available benchmarks, using the standard evaluation metric ROUGE scores. (R1, R2, RL are scores based on 1-gram, 2-gram, and the longest common subsequence matching, respectively.) As shown in Table 2 below, Coop—combined with a VAE-based model BiMeanVAE—established new state-of-the-art performance on both the two benchmarks, outperforming all unsupervised and weakly supervised baselines by a large margin. This further demonstrates that by adapting Coop, which solves a simple optimization problem on top of an (existing) unsupervised opinion summarization model, the user can obtain much better and more specific review summaries.
In this blog, we introduced Coop, an optimization framework that searches for better summary vectors for unsupervised opinion summarization solutions. We are also excited to release an easy-to-use Python library that supports pre-trained VAE-based opinion summarization models and Coop. You can run the state-of-the-art unsupervised opinion summarization systems with just a few lines of Python code! Check out the GitHub repo: https://github.com/megagonlabs/coop