A Recap of Our Student Data Science Challenge With Datalyo in Lyon

Use Cases & Projects, Dataiku Company Vincent de Stoecklin

On November 20, we organized a data science challenge in Lyon with our partner Datalyo, a data-driven consulting firm. The goal was to bring together 50 engineering and business school students with strong data analysis skills to develop analytics and predictive use cases around a real life dataset. Success!

Data Science Challenge: Data-Driven Teamwork

The data used for the event was sensor data from the Meudon Green Office, a positive energy building west of Paris. The dataset, provided by Bouygues Construction, totaled 50 million measurements — temperature, pressure, energy consumption, and energy production — provided by more than 1,500 sensors in the building in 2014, as well as descriptions of the sensors.

Experts from Bouygues Construction, Sopra Steria, and Datalyo were also present to help with understanding the data and to assist the groups during the day.

The students received a flash training Dataiku DSS, the data analytics software used for the challenge, before digging into the data. We had prepared individual instances for each group, with the pre-loaded datasets on HP Vertica analytical databases, to improve computation performance.


data challenge datalyo lyonAs students soon discovered, data understanding and data cleaning was to be their first challenge, something that will be familiar to data scientists. Given the specific nature of the data — and highly inexplicit sensor names — just getting a good understanding of the existing data and imagining a relevant use case took the rest of the morning.  

Thanks to the visual data preparation processors in Dataiku DSS, coders and non-coders were equally capable of performing EDA and data wrangling operations (on more than 50 million lines) such as parsing dates, joining datasets, analyzing distributions, splitting or grouping data, and more.

After a brainstorming lunch, the different teams were starting to have a clearer idea of the use case or solution they were going to design and started getting into the thick of things.

While some groups went quickly to the chart engine to start building visualizations, students in data science courses often preferred using the integrated R and Python notebooks to tackle specific problems (extrapolating time series for instance). Overall, they proved both creative — in terms of ideas and use cases — and efficient — leveraging collaboration features in Dataiku DSS to improve their ability to prototype their ideas quickly. And this despite some technical server issues (again, real data scientist life?)... for which we apologize again.

data challenge datalyo
 
data challenge datalyo

 

At the end of the day, every group came to pitch their idea and their prototype in Dataiku DSS, starting from raw data to concrete business value. The results included:

  • A “well-being” dashboard with floor and zone-level comfort indicators
  • Eco-Touch: a tool to optimize energy output given weather predictions
  • A system that predicts how many people are in the building given current sensor measures

All in all, a great event, where we enjoyed meeting new users and helping out on a real-life use case. Thanks again to all the participants, to our partner, Datalyo and to our sponsors Bouygues Telecom and Sopra Steria.

Hope to see you at the next data challenge!

Congratulations

You May Also Like

Taming LLM Outputs: Your Guide to Structured Text Generation

Read More

No-Code ML and GenAI With Dataiku and Fabric

Read More

The Objects of an LLM Mesh for Building LLM-Powered Applications

Read More

Data Lineage: The Key to Impact and Root Cause Analysis

Read More