Hello everyone! Below you’ll find an interview of Ardalan Mehrani, member of the Madatascience team at the last Datasciencegame contest. He met Matthieu Scordia, data scientist at Dataiku, in a presentation of Data Science Studio at the University of "Pierre et Marie Curie" (UPMC). After that, he asked Dataiku and Matthieu to sponsor his team for the datasiencegame challenge.
He’ll tell us more about Dataiku’s sponsorship for the contest, about Matthieu Scordia’s advice, and how DSS helped his team.
Hello Ardalan… Let’s start from the beginning: how about introducing yourself first?
What's up Dataiku! I am Ardalan, French student in machine learning at the University of "Pierre et Marie Curie" (UPMC). I also graduated from a French Engineering school called "Arts et Métiers ParisTech". I'am currently interning at Fortuneo Bank where I develop predictive models to help marketers improving their Mix Media Planning.
How did you first hear about Dataiku?
Dataiku came to UPMC to present their solution. In less than an hour, we were able to submit a model to a Kaggle challenge that scored top 20% (bike sharing demand).
What incited you to ask for Dataiku’s sponsorship for the contest?
As the contest is a two days Hackathon, I felt like your product could speed up lots of pre-processing recurrent tasks and thus, save us precious hours. Moreover, I heard from Dataiku through Matthieu Scordia with whom the vibe was good.
How did Data Science Studio help you reach a higher ranking?
DSS is a tool that makes it easier to implement predictive models without coding it from scratch. It works with scikit-learn. As for the higher ranking, DSS provides pre-processing tools that enabled us to clean and parse data in the blink of an eye. We used the drag and drop function to visualize the correlation between some features and we were able to select the right ones. It definitely lightened the pre-processing burden!
What was the best advice that Matthieu Scordia gave you during the contest?
Matthieu is an experienced data scientist and usually performs very well in such challenges. His best advice was to take good care of our cross validation methods. At the beginning of the challenge, there was a huge gap between our CV score and the LB score. I heard Matthieu's voice (like Yoda to Luke) whispering: "Careful folds selection you should take, the more reliable your CV score will be."
Ardalan, DSS Ambassador
Have you used other similar data science tools before? How do they compare to DSS?
I often use Dato library (formerly Graphlab) that does out of core learning and has cool visualization features too but it still needs some lines of code to achieve the pre-processing tasks whereas DSS can handle that without a single line of code.
What do you like most about DSS? What are your favorite features in the tool?
What I like the most about DSS is its friendly approach. The tool has a modern design, and it is still backed by the current state of the art Machine Learning library.
The coolest feature of DSS is probably the workflow display as you can visualize the entire data pipeline. It enables collaboration work, each of the members can do their own personal pipeline, and at the end we can assemble the results really easily thanks to the DSS workflow.
Would you recommend DSS to other people? Why?
It is hard not to recommend DSS because it addresses a really wide range of professionals: from the BI guys whose could use DSS to analyze terabytes of data without a single line of code, to the experienced data scientist that could use DSS to speed recurrent tasks and concentrate on recurrent neural networks :) the choice is quickly made!
Many thanks Ardalan for the interview and for being a great DSS ambassador.