With the rise of e-commerce, user reviews now play an integral role in the purchasing process. Their immense growth has spurred the need for review summarization systems. However, current approaches do little to alleviate the laborious task of decision-making. To address this, we’re proud to introduce ExtremeReader. This interactive, customizable system streamlines review summary exploration and explanation mining.
The Limitations of Current Review Summarization Systems
A recent study indicates that more than 90% of customers read user reviews before purchasing a service or product online. But, as we all know, reading reviews is often a tedious, time-consuming process. Not only can the volume of reviews be vast, but many of them tend to overlap in content.
Despite extensive research on this topic, existing review summarization systems fall short in two significant aspects. First, current systems generate generic summaries. This means that users cannot tailor the summary’s granularity to meet their specific needs.
Second, the majority of existing review summarization systems do not adequately perform explanation mining; they generate extractive summaries using only certain salient aspects of reviews. This incomplete depiction of overall sentiment doesn’t offer any explanation or nor is it an effective way for users to explore various factors of the produced summary and understand why that summary was produced.
To solve these limitations, we built ExtremeReader. This novel summarization system allows users to tailor summary content and interactively explore and understand summaries so they can quickly find their desired insights. ExtremeReader also generates abstractive summaries with an underlying structure that helps users find and understand explanations between opinions and why that summary is generated.
How ExtremeReader Works
Imagine you’re planning a trip to San Francisco. You’ve found a promising hotel to stay at. But before you book it, you want to evaluate the place based on its reviews. More specifically, you want to ensure it has a few particular factors: nice service, easy parking, convenient location, and a safe neighborhood.
ExtremeReader's Structural and Textual Summaries
Rather than browse reviews for hours on end to elucidate these criteria, you turn to ExtremeReader. With ExtremeReader, you can easily tailor your review summarization by using the search function (Fig 2).
After receiving the search queries, ExtremeReader generates two types of customized summaries for you.
The first summary (or structured summary) as shown in Fig 3 provides a concise, structured view of the opinions mentioned in the reviews. Each node is labeled by a representative and denotes has a representation label of a group of semantically similar opinions. Every directed edge indicates an explanation between the source node opinion (cause) and the target node opinion (outcome).
In the case of our example, you’ll find that this hotel has “friendly staff,” a “great location,” and “adjoining parking.” But it doesn’t meet our last criterion; it’s in a “bad neighborhood.” This summary also allows you to quickly identify the factors that were accounted for in each outcome. For instance, the hotel is considered to be in a “great location” because it’s “near downtown,” “close to public transportation,” etc (as shown in Fig 3, highlighted in red circle).
The second summary (or textual summary) as shown in Fig 4 is an easy-to-understand text that articulates the main points of the first summary. While completely machine-generated, these textual summaries are often fluent and grammatically correct.
Besides these two summaries, ExtremeReader also enables users to browse original reviews interactively (Fig 5). When you click an opinion (i.e., a node) in the structured summary, ExtremeReader will show the original review sentences from which it was extracted. Similarly, ExtremeReader can also display the original review sentences that support the opinion explanations (i.e., the edges) in the structured summary.
Benefits of ExtremeReader's Structural and Textual Summaries
In contrast to existing review summarization systems, which can hardly explain, ExtremeReader is able to provide explanations in two folds. Besides the explanation between extracted opinions, ExtremeReader also provides explanations of how the summaries are generated, making the generation process less opaque:
- By clicking nodes in the structured summary (Fig 3), it is easy to trace back the reviews that contribute to the nodes and therefore explain how the structured summary is generated.
- The structured opinion graph offers an explanation to the generated textual summary. By inserting an intermediate structure (i.e., this opinion graph), it makes it easier to explain the downstream model that is used to generate the textual summary.
How ExtremeReader Streamlines Explainable and Customizable Summaries
ExtremeReader has two core review summarization tools to support its interactive functionality. The first tool is a structured review summary pipeline; this pre-computes the structured opinion summary. The second tool is a textual summary framework; this generates a controllable abstractive summary from the structured opinion sub-graph.
In addition to these two tools, we’ve also implemented a customizable search function over the structured opinion summary. With this capability, users can intuitively and efficiently tailor the summary.
Supporting Structured Summaries
The figure above depicts how our review pipeline supports the generation of structured summaries. Let’s walk through these steps in more detail to understand them better.
Given a corpus of reviews, we first extract or mine structured opinions from this unstructured text. For example, if a review contains the words “friendly and helpful staff,” we want to extract the opinions “friendly staff” and “helpful staff.”
To extract these opinions, ExtremeReader utilizes Snippext, our opinion mining system that lets you achieve state-of-the-art performance with 50% or less of the training data usually required.
After we’ve extracted opinions, we then conduct explanation mining on the relationships between them to make our summary explicable. Providing summary explanations can greatly aid users in gaining a more holistic understanding of the reviews.
For example, customers may assume a centrally located hotel is noisy due to heavy traffic outside. But other potential reasons include “paper-thin walls” or “noisy rooms.” These possibilities may not be expected, but they can certainly influence a user’s decision to reserve a room.
To properly mine explanations, we define a classification task in which the input includes two opinions and a review context. The objective of this task is to classify whether the first opinion explains the second one.
Finally, we integrate semantically similar opinions and construct a concise opinion graph summary.
Often, reviews express similar opinions in different ways. For example, “loud street traffic” and “heavy traffic noise” are essentially the same. In this step, we eliminate redundant opinions so that semantically similar ones only appear once in our summary.
To accomplish this, we leverage an unsupervised integration approach that takes various types of additional information (e.g., aspects, sentiments, explanations) into account during the integration process.
Supporting Textual Summaries
The figure above shows how our textual review framework generates easy-to-understand summaries. It uses a seq2seq model to generate a textual summary based on input opinions. More specifically, ExtremeReader employs Transformer, a standard seq2seq model.
As we did before with the structured review summary pipeline, let’s walk through the steps in this diagram to understand them better.
On the left side, you’ll see our training phrase. In this step, our goal is to train the seq2seq model to articulate a coherent piece of text from a sequence of opinions.
To achieve this objective, we tap into our abundant resource of reviews. For each review, we initially form an opinion sequence by concatenating its extracted opinions with the separator symbol [sep]. Then we train the model to reconstruct the original review from this opinion sequence. To train Transformer, we used its default settings.
On the right side, you’ll see our generation phrase. First, we apply opinion serialization on the opinion graph obtained from the structured review summary pipeline to collect an opinion sequence. Then, we use our trained seq2seq model to generate the textual summary from the serialized opinion sequence.
For opinion serialization, we employ the breadth-first search algorithm as our strategy. This ensures that correlated opinions are close to one another in the serialized opinion sequence.
A Better Way to Understand Reviews
We hope you’ve enjoyed this brief overview of how ExtremeReader solves the current limitations of review summarization systems. To the best of our knowledge, this is the first system that is able to generate customizable and explainable summaries in both structured and abstractive textual formats.
By streamlining interactive summary exploration and explanation mining, ExtremeReader brings vast potential to the future of user reviews and e-commerce. We’re excited to continue our research and development on this system and hope to address more challenges facing review summarization systems. Stay tuned!
Written by Xiaolan Wang and Megagon Labs