Snippext: An Opinion Mining Pipeline That Uses Less Training Data

Understanding public sentiment can unlock unprecedented insights for every business. Consequently, opinion mining has rapidly grown in popularity. But building high-precision, high-recall opinion mining pipelines usually requires an immense amount of data.

Snippext is a state-of-the-art (SOTA) opinion mining pipeline that extracts aspects, opinions, and sentiments from user-generated content such as online reviews. It allows for a reduction of 50% or more of the training data usually required through: 1) Data augmentation that automatically generates more labeled training data from existing ones. 2) Semi-supervised learning that leverages massive amounts of unlabeled data.

With these optimizations, Snippext operates comparably and even outperforms previous SOTA results on several opinion mining tasks. It also extracts significantly more fine-grained opinions that enable new opportunities for downstream applications.

Snippext has been successfully deployed across numerous domains, including hospitality, food, and e-commerce.

Other Projects: