HappyDB: a happiness database of 100,000 happy moments

Understanding what makes people happy is essential to augment positive experiences. We built HappyDB, a crowd-sourced collection of 100,000 happy moments that we make publicly available. Our goal is to build NLP technology that understands how people express their happiness in text while achieving insights into happiness-leading events and scenarios on a scale. Moreover, we are interested in developing systems that suggest sustainable actions for individuals that lead to an overall improvement in their well-being within these actual moments. HappyDB is an exciting resource for the emerging research field regarding the intersection between NLP and positive psychology.

Read Paper

Download Dataset  

What is Happy DB?

HappyDB is a corpus of 100,000 crowd-sourced happy moments. The goal of the corpus is to advance the state of the art of understanding the causes of happiness that can be gleaned from text.

Using the Dataset?

If you use HappyDB in your work, please cite the paper as:

  title = {HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments}, 
  author = {Asai, Akari and Evensen, Sara and Golshan, Behzad and Halevy, Alon
  and Li, Vivian and Lopatenko, Andrei and Stepanov, Daniela and Suhara, Yoshihiko
  and Tan, Wang-Chiew and Xu, Yinzhan}, 
  booktitle = {Proceedings of LREC 2018},  
  month = {May},   year={2018}, 
  address = {Miyazaki, Japan}, 
  publisher = {European Language Resources Association (ELRA)}

Dataset Description

Simply stated, HappyDB is a collection of happy moments described by individuals experiencing those moments. The following are some examples:

1. When I was on top of a hotel, looking at the city below me. 
2. in the morning I received my college degree, receiving the title turn and behind me all my proud of my family was, for the goal that had just turned.
3. today was a school holiday for my son , woke up and played with him.
5. The kitchen now gleams with new paint. Our annual renovation is over and all the colors we chose are set for at least a year. I love our new colors.

Collecting happy moments

The happy moments are crowd-sourced via Amazon’s Mechanical Turk. We presented each worker with the following task:

What made you happy today? Reflect on the past 24 hours, and recall
three actual events that happened to you that made you happy. Write
down your happy moment in a complete sentence.
(Write three such moments.)

In this task, the “past 24 hours” is what we call the reflection period. HappyDB also contains happy moments with reflection periods “past week” and “past month”.

Along with each happy moment, we have collected the demographic information of the worker who provided the moment.

Lab in the wild

To further provide resources for researchers interested in the science of happiness, we have partnered with Lab In The Wild to collect more happy moments. We encourage you to take a look at our task on Lab In The Wild.

Cleaning the corpus

The HappyDB corpus, like any other human-generated data, has errors and requires cleaning. Many workers did not write down complete sentences or had spelling errors. To make using the corpus more convenient, we have created a clean version of the corpus that deals with the issues mentioned earlier. More specifically, we have:

  1. removed any happy moment that consists of a single word,
  2. corrected the misspelled words (if we could infer the correct spelling from the context).

Rotom: A multi-purposed data augmentation framework for training high-quality machine learning models

We propose Rotom, a multi-purposed data augmentation framework for training high-quality machine learning models while requiring only a small number (e.g., 200) of labeled examples.

Snippext: An Opinion Mining Pipeline that Uses Less Training Data

Snippext is a state-of-the-art (SOTA) opinion mining pipeline that extracts aspects, opinions, and sentiments from user-generated content such as online reviews. It allows for a reduction of 50% or more of the training data usually required.

ExtremeReader: An Interactive Explorer for Customizable and Explainable Review Summarization

ExtremeReader generates both a structured and abstractive summarization that are easier to interpret. It also allows users to explore and see explanations of these summaries by drilling down or up to the desired level of granularity. Users can even see the sentence from which the opinion features were extracted.

OpineDB and Voyageur: How Subjective Databases and Experiential Search Can Improve Customer Experiences

We developed OpineDB a subjective database system that addresses these challenges by interpreting subjective predicates against a database schema through a combination of natural language processing (NLP) and information retrieval (IR) techniques.