IEEE
2018
Chen Chen, Behzad Golshan, Alon Y. Halevy, Wang-Chiew Tan, AnHai Doan
We present BIGGORILLA, an open-source resource for data scientists who need data preparation and integration tools, and the vision underlying the project. We then describe four packages that we contributed to BIGGORILLA: KOKO (an information extraction tool), FLEXMATCHER (a schema matching tool), MAGELLAN and DEEPMATCHER (two entity matching tools). We hope that as more software packages are added to BIGGORILLA, it will become a one-stop resource for both researchers and industry practitioners, and will enable our community to advance the state of the art at a faster pace.