On November 20th, Dataiku and its partner Datalyo, a data-driven consulting firm, organized a Data Science Challenge in Lyon, France. The objective was to bring together 50 engineering and business school students with strong data analysis skills to develop analytics and predictive use cases around a real life dataset.
Data Science Challenge: Data-driven teamwork
The data used for the event was sensor data from the Meudon Green Office, a positive energy buildingwest of Paris. The dataset, provided by Bouygues Construction, totaled 50 million measurements - temperature, pressure, energy consumption, and energy production - provided by more than 1,500 sensors in the building in 2014, as well as descriptions of the sensors.
The students received a flash training Data Science Studio (DSS), the data analytics software used for the challenge, before digging into the data. We had prepared individual instances for each group, with the pre-loaded datasets on HP Vertica analytical databases, to improve computation performance.
As students soon discovered, data understanding and data cleaning was to be their first challenge, something that will be familiar to data scientists. Given the specific nature of the data - and highly inexplicit sensor names :) - just getting a good understanding of the existing data and imagining a relevant use case took the rest of the morning.
Thanks to the visual data preparation processors in DSS, coders and non- coders were equally capable of performing EDA and data wrangling operations (on more than 50 million lines) : parsing dates, joining datasets, analyzing distributions, splitting or grouping data...
After a brainstorming lunch, the different teams were starting to have a clearer idea of the use case or solution they were going to design and started getting into the thick of things.
While some groups went quickly to the chart engine to start building visualizations, students in data science courses often preferred using the integrated R and Python notebooks to tackle specific problems (extrapolating time series for instance). Overall, they proved both creative - in terms of ideas and use cases - and efficient - leveraging collaboration features in DSS to improve their ability to prototype their ideas quickly. And this despite some technical server issues (again, real data scientist life?)... for which we apologize again :)
At the end of the day, every group came to pitch their idea and their prototype in DSS, starting from raw data to concrete business value. The results included:
- A “well-being” dashboard with floor and zone-level comfort indicators
- Eco-Touch: a tool to optimize energy output given weather predictions
- A system that predicts how many people are in the building given current sensor measures
All in all, a great event, where we enjoyed meeting new users and helping out on a real- life use case. Thanks again to all the participants, to our partner, Datalyo and to our sponsors Bouygues Telecom and Sopra Steria.
Best wishes for 2016 and see you at the next Data Challenge !