CoCoSum: Summarizing Contrastive and Common Opinions from Reviews

Widely available online customer reviews help users with decision making in a variety of domains (e.g., hotel, restaurant, or job). After creating a list of candidate choices based on initial preference (e.g., area, price range, and restaurant type), the user often has to compare a few choices in depth by carefully reading the reviews to make a final decision. However, this method is time consuming and difficult for the user to detect differences and similarities between the candidates, as those pieces of information are often scattered in different reviews.

Single Entity Opinion Summarization

Figure 1: Existing Single Entity Opinion Summarization

The recent success of neural summarization techniques and the growth of online review platforms led to establishing the recent focus of research on multi-document opinion summarization. The goal of multi-document opinion summarization is to generate a summary that represents salient opinions in reviews of a particular hotel or product. In previous blog articles, we presented a series of our efforts on opinion summarization, including Coop, an entity-specific opinion summarization system, OpinionDigest, a controllable opinion summarization system, ExtremeReader, an interactive explorer for customizable review summarization, and Snippext, a powerful aspect-based opinion extractor which enhances the aforementioned summarization systems. While generated summaries offer general and concise information about a particular hotel or product, the information may be insufficient to help the user compare multiple choices. Thus, the user may still struggle with the question: “Which one should I pick?”

In this blog post, we take one step beyond the current scope of opinion summarization and propose CoCoSum, a framework which aims to generate contrastive and common summaries by comparing multiple entities. This framework consists of two base summarization models that jointly generate contrastive and common summaries.

Comparative Opinion Summarization

Figure 2: Comparative Opinion Summarization

Comparative Opinion Summarization

The problem CoCoSum tackles is the comparative opinion summarization problem: given two sets of reviews for two entities such as hotels, we define contrastive opinions of a target entity A against a counterpart entity B as subjective information that is described only in the set of target entity’s reviews, but not in the counterpart entity’s reviews. We refer to the summary that contains such opinions as a contrastive summary. Similarly, we define common opinions of entities A and B as subjective information that is described in both sets of reviews. We refer to the summary that contains common opinions as a common summary.

We formalize comparative opinion summarization as a task that generates two sets of contrastive summaries and one common summary from two sets of reviews for a pair of entities A and B (as shown in the Figure 2). 

On top of this formalization, we created the first comparative opinion summarization benchmark dataset, CoCoTrip, which includes 96 human-written contrastive summaries and 48 common summaries. The CoCoTrip dataset is available here:


In order to summarize contrastive and common opinions from two sets of reviews, the comparative opinion summarization task requires the model to compare and contrast two sets of reviews; however, existing single-entity opinion summarization models do not have such functionality. Therefore, we designed  “collaborative” decoding, which characterizes the target summary distribution by leveraging two base summarization models.

Base Summarization Models

CoCoSum consists of two base summarization models. The base contrastive summarization model is a single-entity summarization model that takes only reviews of the target entity as input, while the base common summarization model takes reviews of two entities as input. In both cases, the input reviews are concatenated into a single sequence before encoding. To help the encoder distinguish the entity, additional type embeddings are introduced into the input layer of the encoder. The base summarization models produce contrastive summary token probability p_cont (Figure 3, a) and common summary token probability p_comm (Figure 3, b).

Base Summary Models Figure

Figure 3: Base Summary Models

Collaborative Decoding

Collaborative decoding combines predictions of the target and the counterpart (and common, for common summary generation) opinion summarization models during the inference time.

The key idea of collaborative decoding is to aggregate token probability distributions of contrastive summarization model p_cont and common summarization model p_comm at each step. That way, the two models can collaboratively generate (1) contrastive summaries that contain distinctive opinions that do not appear in the counterpart review set and (2) common summaries that only contain common opinions that appear in both target and counterpart review sets.

Contrastive Summary Generation

Contrastive Summary Figure

Figure 4: Contrastive Summary Generation

To improve the distinctiveness of generated contrastive summaries that only contain entity-specific opinions, we considered penalizing the tokens that are likely to appear in the counterpart entity. That is, we would use two token probability distributions and highlight tokens that are distinctive compared to the counterpart entity by using the token ratio distribution between them.

The intuition behind this approach is that the token ratio distribution highlights distinctive tokens that are relatively unique to the target entity, which is emphasized by combining them with the original token distribution.

Common Summary Generation

Common Summary

Figure 5: Common Summary Generation

Common summaries should contain common opinions that are about a given pair of entities. However, we observed that simply fine-tuned summarization models tend to generate overly generic summaries that can be true for any entity pair.

To incorporate the entity-specific information into the common summary, we designed collaborative decoding to use the sum of the token probability distributions of the contrastive summarization model with the original token probability distribution.

The intuition behind this approach is that we should first identify salient tokens for the input entity pair by adding the token probability distributions of contrastive summaries. Only then it is combined with the original distribution. 

We would like to emphasize that collaborative decoding is a token probability distribution calculation method for comparative opinion summarization based on two summarization models; thus, it is independent of the base summarization model and the decoding algorithm.

Generating Comparative Summaries with or without Collaborative Decoding

Let’s illustrate how collaborative decoding improves the distinctiveness of the generated summaries. The following sections will elaborate on this topic.

Contrastive and Common Summaries

Let’s first take a look at summaries the base model and CoCoSum generate. We highlight “bad” content with underline when it includes incorrect opinions, hallucinations, or overly generic descriptions. As shown, CoCoSum generates summaries with much less “bad” content than the base model.

Contrastive and Common Summaries Table

Human Evaluation

To further confirm the effectiveness of CoCoSum, we asked human annotators to judge the quality of generated summaries in three different aspects:

(1). Content overlap: how much the content between contrastive and common summaries overlap. Less overlap is better.

(2). Content support: whether the summary aligns with the input reviews (i.e., degree of hallucination), ratios for sentences that are fully, partially, or not supported by the input review.

(3). Quality: how coherent, informative, and non-redundant the summary is, ranging from 1 to 5.

As shown in the table below, summaries generated by CoCoSum contain less content overlap between the generated summaries and more content support for the target hotel’s reviews. Meanwhile, compared to BiMeanVAE and OpinionDigest, CoCoSum shows much better performance on all three criteria.

Better Performance Criteria by CoCoSum

Automatic Evaluation

Last but not least, we evaluated the performance of summarization quality by BERTScore and by distinctiveness score. The BERTScore calculates the soft semantic alignment between the gold reference summaries and the generated summaries (higher the better) and distinctiveness score measures ratio of non-overlapping uni-grams in contrastive and common summaries. As shown in the table below, the CoCoSum model performs the best among the strong baselines of the opinion summarization system.
CocoSum performance


In this blog, we introduced CoCoSum, an opinion summarization framework for comparative opinion summarization. We released the first benchmark dataset for comparative opinion summarization, CoCoTrip, and the codebase to reproduce this study. You can find them here: 

Are you interested in learning more about CoCoSum? Check out our upcoming ACL finding paper (, which will be presented at ACL 2022 (May 22-27, 2022)! Do you have any questions about how it works? Contact us today!

Written by: Hayate Iso, Xiaolan Wang, and Megagon Labs

Follow us on LinkedIn and Twitter to stay up to date with new research and projects.


More Blog Posts: