On November 20, we organized a data science challenge in Lyon with our partner Datalyo, a data-driven consulting firm. The goal was to bring together 50 engineering and business school students with strong data analysis skills to develop analytics and predictive use cases around a real life dataset. Success!
Data Science Challenge: Data-Driven Teamwork
The data used for the event was sensor data from the Meudon Green Office, a positive energy building west of Paris. The dataset, provided by Bouygues Construction, totaled 50 million measurements — temperature, pressure, energy consumption, and energy production — provided by more than 1,500 sensors in the building in 2014, as well as descriptions of the sensors.
Experts from Bouygues Construction, Sopra Steria, and Datalyo were also present to help with understanding the data and to assist the groups during the day.
The students received a flash training Dataiku DSS, the data analytics software used for the challenge, before digging into the data. We had prepared individual instances for each group, with the pre-loaded datasets on HP Vertica analytical databases, to improve computation performance.
As students soon discovered, data understanding and data cleaning was to be their first challenge, something that will be familiar to data scientists. Given the specific nature of the data — and highly inexplicit sensor names — just getting a good understanding of the existing data and imagining a relevant use case took the rest of the morning.
Thanks to the visual data preparation processors in Dataiku DSS, coders and non-coders were equally capable of performing EDA and data wrangling operations (on more than 50 million lines) such as parsing dates, joining datasets, analyzing distributions, splitting or grouping data, and more.
After a brainstorming lunch, the different teams were starting to have a clearer idea of the use case or solution they were going to design and started getting into the thick of things.
While some groups went quickly to the chart engine to start building visualizations, students in data science courses often preferred using the integrated R and Python notebooks to tackle specific problems (extrapolating time series for instance). Overall, they proved both creative — in terms of ideas and use cases — and efficient — leveraging collaboration features in Dataiku DSS to improve their ability to prototype their ideas quickly. And this despite some technical server issues (again, real data scientist life?)... for which we apologize again.
At the end of the day, every group came to pitch their idea and their prototype in Dataiku DSS, starting from raw data to concrete business value. The results included:
- A “well-being” dashboard with floor and zone-level comfort indicators
- Eco-Touch: a tool to optimize energy output given weather predictions
- A system that predicts how many people are in the building given current sensor measures
All in all, a great event, where we enjoyed meeting new users and helping out on a real-life use case. Thanks again to all the participants, to our partner, Datalyo and to our sponsors Bouygues Telecom and Sopra Steria.
Hope to see you at the next data challenge!