Generating Specific Review Summaries with Coop

With the rapid growth of online review platforms such as Yelp and Glassdoor, people are now relying on customer reviews to make decisions on everything from dining to job search. One study shows that over 94% of online customers read reviews before making decisions. However, massive amounts of reviews are posted on the platforms by customers every day, which makes it difficult to find the useful opinions that you are looking for. Opinion summarization is a solution for those situations.

The opinion summarization system extracts representative opinions from reviews and makes a summary of the opinions in a concise and easy-to-understand form. This allows users to make decisions without reading many reviews.

Normally, a text summarization system is built by training a neural network model using a large number of human-written summaries. This is difficult for opinion summarization systems as it is expensive and infeasible to collect sufficient amounts of human-written summaries for customer reviews that contain a wide variety of opinions. Consequently, the primary research focus of opinion summarization is to develop an opinion summarization model using the unsupervised approach that does not require human-written summaries.

The previous blog articles presented a series of our efforts on opinion summarization, including OpinionDigest, a controllable opinion summarization system, ExtremeReader, an interactive explorer for customizable review summarization, and Snippext, a powerful aspect-based opinion extractor which enhances both of the summarization systems.

In this blog post, we’d like to introduce a new summarization framework, Coop, which helps unsupervised opinion summarization generate more specific summaries from customer reviews.

We found that existing unsupervised opinion summarization systems often generate overly generic summaries that are entity agnostic (e.g., not specific about a certain restaurant or product). This is not preferable as the user wants specific information about each restaurant/product from the generated summary. Thus, we developed Coop to improve the unsupervised opinion summarization model. By generating more specific summary information, Coop’s applicability broadens.

Unsupervised Opinion Summarization

Before diving into the Coop framework, let us explain the standard way to build an unsupervised opinion summarization system and the problem that leads to generating overly generic summaries.

Learning to Reconstruct and Summary Generation

A common approach for unsupervised opinion summarization is to build an encoder-decoder model with the reconstruction objective. As shown in Fig 1, the model learns to embed review text into a vector representation in the latent space, which is then decoded back to the original review text. By learning to reconstruct a massive number of reviews, the model should learn to encode semantically similar reviews into similar vectors in the latent space. 

For summarizing multiple reviews, the model separately encodes input reviews into latent vectors and aggregates the latent vectors into a summary vector by taking the simple average. By doing so, we expect that the summary vector will be decoded into a fluent summary that contains representative opinions in the input reviews.

Fig 1: How encoder-decoder models are trained with the reconstruction objective. The model learns to reconstruct the original text from the latent vector.
Summary Vector Degeneration

The simple average is a natural and intuitive way to calculate a summary vector in the latent space; it has been used as the de-facto standard for existing unsupervised opinion summarization models.

However, we found that summaries generated from the simple average summary vector tend to become too generic. Figure 2 shows the latent vectors of two sets of three reviews about different restaurants. Although the original latent vectors are significantly different, the simple average summary vectors get closer to the center in the latent space. Thus, the summary vectors are decoded into semantically similar and too generic summaries, which can apply to any restaurant. We refer to this problem as summary vector degeneration. More than other issues, summary vector degeneration prevents opinion summarization models from producing specific summaries (i.e., summaries with representative and distinctive opinions).

Fig. 2: Existing unsupervised opinion summarization models with the simple average aggregation tend to generate too similar summaries for any entities (i.e., entity agnostic).

Finding Better Summary Vectors with Coop

To address this issue, we introduce Coop, a latent vector aggregation framework that helps opinion summarization models generate more specific summaries. Solving the summary vector degeneration problem is not straightforward. For instance, we have tried re-scaling the summary vector by multiplying by a constant value to make the summary vector more distinctive. This idea helped a little bit but ended up generating “hallucinated” content (i.e., information not in input reviews). Thus, it is essential and challenging to strike a balance between specificity and content support. To this end, we formulate the unsupervised opinion summarization task as an optimization problem to find a better summary vector that generates summaries more aligned with input reviews.

To formulate an optimization problem, we need to define (1) the objective function (i.e., what criteria to maximize or minimize) and (2) the search space (i.e., which candidates to search for). We will describe each point below.

Figure 3: The Coop framework searches for the summary vector that maximizes the input-output overlap between a generated summary and the input reviews.
Objective: Input-Output Word Overlap

One fundamental issue in the simple average aggregation method is that the summary vector calculation is decoder agnostic. The simple average aggregation method does not take into account the decoder behavior, although it is the decoder that generates a summary from a summary vector. This leads us to consider a criterion that evaluates the alignment between a generated summary and the input reviews. Specifically, we choose word-level overlap between a generated summary and input reviews (which we refer to as input-output word overlap) as the objective to maximize.

Figure 4 shows an example of a generated summary and input reviews. We assume that a summary with higher input-output word overlap contains more input-review opinions. This also helps the decoder avoid generating hallucinated content to create more factual summaries. Furthermore, we can measure the input-output overlap without relying on manually written reference summaries, which enables Coop to explore summary vectors in a fully unsupervised fashion.

Figure 4: Word-level overlap between input reviews and an output summary (input-output word overlap) is a good metric to evaluate if the output summary covers opinions in the input reviews.
Search Space: Powerset of Reviews

With the input-output word overlap as the objective, the problem is now reduced to finding a summary vector that maximizes this criterion. Through extensive exploration, we find that simplifying the search space to either include or exclude an input review for the summary vector aggregation, which we call the Powerset, can produce better summary vectors more efficiently. Thus, we narrow down the search space to Powerset to find the summary vector that maximizes the input-output word overlap. Intuitively, Coop decides to use (or not) each of the input reviews and takes the average of the latent vectors of the selected reviews to make a summary vector.

Generating More Specific Summaries by Coop

Now you may wonder how good the summaries are that Coop generates. Let’s first take a look at the generated summaries for a Chinese restaurant and an American restaurant. As shown in the table below, when using the conventional simple average aggregation strategy, the two summaries convey almost identical information. Conversely, when using Coop, the summaries contain more entity-specific information. For instance, the summary for the Chinese restaurant talks about the cuisine type and portion of foods, while the one for the American restaurant includes a variety of dishes that are served by this restaurant!

Human Evaluation

The above examples show that Coop can produce more specific summaries than the simple average method. To further demonstrate this, we ask human annotators to manually judge the quality of generated summaries. More specifically, we ask them to judge the quality of the summaries with respect to two criteria:

  • Informativeness: how much useful information a summary contains, ranges from -100 (worst) and +100 (best);
  • Content support: whether the summary aligns with the input reviews (i.e., degree of hallucination), ratios for sentences that are fully, partially, or not supported by the input review.

As shown in Table 1 below, summaries generated by Coop are more informative than the simple average. Meanwhile, Coop also behaves well on content support as it generates more sentences with full/partial content support than the other methods. These results indicate that Coop is able to generate more specific and informative summaries that are well supported by the input reviews.

Automatic Evaluation

Finally, we evaluated the performance of Coop on publicly available benchmarks, using the standard evaluation metric ROUGE scores. (R1, R2, RL are scores based on 1-gram, 2-gram, and the longest common subsequence matching, respectively.) As shown in Table 2 below, Coop—combined with a VAE-based model BiMeanVAE—established new state-of-the-art performance on both the two benchmarks, outperforming all unsupervised and weakly supervised baselines by a large margin. This further demonstrates that by adapting Coop, which solves a simple optimization problem on top of an (existing) unsupervised opinion summarization model, the user can obtain much better and more specific review summaries.


In this blog, we introduced Coop, an optimization framework that searches for better summary vectors for unsupervised opinion summarization solutions. We are also excited to release an easy-to-use Python library that supports pre-trained VAE-based opinion summarization models and Coop. You can run the state-of-the-art unsupervised opinion summarization systems with just a few lines of Python code! Check out the GitHub repo:

Are you interested in learning more about Coop? Check out our upcoming EMNLP finding paper, which will be presented at EMNLP 2021 (November 7-11, 2021) and NewSum Workshop 2021 (November 10, 2021)! Do you have any questions about how it works? Contact us today!

Written by Hayate Iso, Xiaolan Wang, Yoshihiko Suhara, and Megagon Labs


More Blog Posts: