Data Hut

Diagram of the flow in the DataHut pipeline.

Data Hut’s Data Pipeline is a site we launched this spring. It combines manual curation and automated analytics to provide insight into data science and big data open source projects. We currently cover 128 different projects, from AirFlow to XGBoost. The code for DataHut started last year with a single Python script to download Git commits and a Jupyter notebook for data analysis and visualization. This allowed us to gain insight into our… Read More »Data Hut’s Data Pipeline