BigGorilla is an open-source data integration and data preparation ecosystem (powered by Python) to enable data scientists to perform integration and analysis of data. BigGorilla consolidates and documents the different steps that are typically taken by data scientists to bring data from different sources into a single database to perform data analysis. For each of these steps, we document existing technologies and also point to desired technologies that could be developed.
The different components of BigGorilla are freely available for download and use. Data scientists are encouraged to contribute code, datasets, or examples to BigGorilla. We hope to promote education and training for aspiring data scientists with the development, documentation, and tools provided through BigGorilla.