Understanding public sentiment can unlock unprecedented insights for every business. Consequently, opinion mining has rapidly grown in popularity. But building high-precision, high-recall opinion mining pipelines capable of high-quality information extraction and analysis usually requires an immense amount of information.
Snippext is a state-of-the-art (SOTA) opinion mining pipeline that extracts aspects, opinions, and sentiments from user-generated content such as online reviews. It allows for a reduction of 50% or more of the training data usually required through:
Data augmentation that automatically generates more labeled training data from existing ones. This was inspired by a sentence classifier training method commonly used in natural language processing (NLP).
Semi-supervised learning that leverages massive amounts of unlabeled data.
With these optimizations, Snippext operates comparably and even outperforms previous SOTA results on several opinion mining tasks. It also extracts significantly more fine-grained opinions that enable new opportunities for downstream applications.
The Megagon Labs team evaluated the performance of two of Snippext’s modules by applying them to two aspect-based sentiment analysis (ABSA) tasks, aspect extraction (AE) and aspect sentiment classification (ASC). Snippext was able to achieve SOTA performance with only half or even a third of the original dataset. When the entire dataset was leveraged, Snippext outperformed SOTA models by up to 3.55% during these aspect-based sentiment analysis tasks.
Snippext has been successfully deployed across numerous domains for information extraction and sentiment analysis, including hospitality, food, and e-commerce. This is just the beginning of what’s possible with this system. We are currently exploring optimization opportunities such as multitask learning and active learning to further reduce labeled data requirements for Snippext.