Unlocking Real Value from User Reviews With Subjective Data and Experiential Search — Part 2

Welcome back to our series on how subjective data and experiential search can unlock more value from user reviews.

In our first post, we examined how regular search engines and review platforms can cause a misalignment between expectations and experience. We also covered the demand for experiential search and why it has never been done before. In case you missed this article, you can read it here.

For the second and final chapter of this series, we’ll take a look at how our subjective data system, OpineDB, solves the most common obstacles to experiential search. We’ll also explore how OpineDB’s performance stacks up to two of today’s most popular search protocols.


Solving the Biggest Challenges of Experiential Search

As we discussed in our previous blog post, three substantial issues stand in the way of experiential search:

1. User reviews must be aggregated in a way that they can be queried efficiently and effectively.

2. The experiential search engine must systematically answer queries, regardless of their complexity.

3. The engine must also be able to handle searches using terms that don’t fit neatly into the subjective database schema.

To illustrate these challenges, let’s examine how OpineDB solves them in the context of searching for hotels.

A Complex (but Realistic) Search

Consider a user who’s looking for a hotel in London that costs less than 180 pounds per night, has really clean rooms, and is a romantic getaway. While the first predicate is objective and simple to satisfy, the second and third predicates are subjective and more nuanced.

To start solving this query, OpineDB extracts key experiences from a large set of user reviews with the help of sentiment analysis, opinion mining, and BERT, Google’s neural network-based technique for natural language processing pre-training. OpineDB also relies on key phrases from the reviews and markers, which are distinctions about the domain that the application designer thinks are important.

Designers select markers (or representative phrases) based on review data mining and the requirements of the specific application. For instance, a designer can decide if room cleanliness should be modeled by {clean, dirty} or if bathroom style is modeled by {old, standard, modern, luxurious}. These choices can have a significant impact on the quality of the query results.

How OpineDB Finds the Perfect Hotel for You

OpineDB extracts a number of linguistic phrases from reviews, organizes them into a schema which is essentially a set of attributes with markers (i.e., representative phrases). So all linguistic phrases about room cleanliness will be mapped to the room cleanliness attribute and specifically to the markers clean or dirty. All linguistic phrases about bathroom styles will be mapped to the attribute bathroom style and specifically the markers old/standard/modern/luxurious.

Depending on the user’s query, this can be a challenging task. For example, how do you identify which hotels are romantic getaways? The word “romantic” probably wouldn’t be in the hotel domain schema. OpineDB solves this problem by reformulating the query for romantic rooms into a combination of attributes present in the schema.

In reviews for romantic occasions like wedding anniversaries or honeymoons, users may frequently mention other attributes such as exceptional staff or luxurious bathrooms. Since these subjective attributes are present in the schema, OpineDB can use them to rank which hotels are the most romantic. Conversely, if the user queries for something unrelated to anything in the subjective database schema, OpineDB will search through the text reviews to see if any of them mention this property.

Following this process for the above query , OpineDB ranks and outputs the search results according to the combination of the objective predicate and the two subjective predicates. It also includes relevant snippets from the reviews that elaborate on the searched terms in case the user wants to dive deeper into details about any of their queried conditions.

Subjective Search Can Close the Gap Between Expectations and Experience

To test OpineDB’s efficiency and effectiveness, we pitted it against an information retrieval-based search engine (IR) and an attribute-based query engine (AB). In our experiment, we used real subjective data from Booking.com and Yelp.

Examples of IR include popular Internet search engines like Google and Bing. The IR baseline used in this experiment is an implementation of Okapi BM25, a retrieval model that estimates the relevance of documents to a given search query. In this case, we use it to rank entities based on the opinions it receives.

The AB baseline is what users typically encounter in online services such as Booking.com or Yelp. It’s a strong baseline to compare OpineDB with since it allows users to freely try different combinations of queryable attributes to obtain fine-tuned results.

Experimental Settings

In order to accurately compare the quality of OpineDB’s query results with the IR and AB baselines, we constructed subjective databases for the hotels and restaurants using real-world datasets. We leveraged 515,739 user reviews for 1,493 hotels and 176,302 user reviews for 860 restaurants.

We also collected 190 subjective query predicates for hotels and 185 subjective query predicates for restaurants to construct conjunctions of these phrases. From there, we organized the subjective searches into categories: easy (2 conjuncts), medium (4 conjuncts), or hard (7 conjuncts). Each category contains 100 subjective queries.

To measure how well the search results satisfied the subjective query predicates, we used a metric based on Normalized Discounted Cumulative Gain (NDCG). Basically, we measure the quality of the search results by the total number of query predicates satisfied and the relevancy of the top results to these predicates. Irrelevant entities close to the top of the search results count as penalties to the score.

We repeated this experiment with 10 different samples of query sets (1,000 queries per setting in total) to verify the integrity of the results.

The Results

Even when evaluated conservatively, OpineDB outperformed the IR and AB baselines by up to 15% for hotel queries and up to 10% for restaurant queries. Because OpineDB can accurately map query predicates to subjective attributes, it performed even better when we added more query predicates.

The results of this experiment also show that OpineDB adds more value to applications as the number of reviews grows. Its result quality in the hotel domain was markedly higher than the restaurant domain because the former contained many more reviews than the latter. With this increase in information, we can refine OpineDB’s histogram summaries to be more representative and statistically significant.

OpineDB also excelled in terms of speed. Through its use of markers, our subjective data system was able to accelerate query processing by up to 660% without compromising on result quality. In case you want to learn more about how OpineDB works or this experiment, check out our full research paper.

Experiential Search Is the Future for all Products and Services

Whether users are looking for a new apartment in a quiet neighborhood, a restaurant with a lively bar scene, or a new job focused on social good, they dedicate countless hours towards searching for the perfect product or service to satisfy their expectations. On the flip side, companies around the world are racing to find potential customers who need their specific offerings most.

Thanks to OpineDB, this gap between expectations and experience can finally be closed. Our subjective data system simplifies search in a way that both consumers and companies can benefit from.

Gone are the days when you’d spend copious amounts of time and energy researching something only to be ultimately disappointed. With OpineDB, you can reserve a restaurant that has a romantic view. You can find a hotel with reliable and fast Wi-Fi. You can ensure that your experience exceeds your expectations, every time.

Are you looking for a better way to connect with users? OpineDB offers a way for every industry to align its products and services with potential customers. All they have to do is search for experience.

Are you interested in learning more about OpineDB? Contact us today!

Written by Yuliang Li and Megagon Labs



Yuliang Li, Aaron Feng, Jinfeng Li, Saran Mumick, Alon Halevy, Vivian Li, Wang-Chiew Tan, “Subjective databases,” VLDB, July 2019.


More Blog Posts: