Hello everyone!
Below you’ll find an interview of Ardalan Mehrani, member of team Madatascience at the last Data Science Game contest. He met Matthieu Scordia, data scientist at Dataiku, in a presentation of Dataiku DSS at the University of "Pierre et Marie Curie" (UPMC). After that, he asked Dataiku and Matthieu to sponsor his team for the Data Science Game contest.
He’ll tell us more about Dataiku’s sponsorship for the contest, Matthieu Scordia’s advice, and how Dataiku DSS helped his team.
Introducing Ardalan
Hello Ardalan… Let’s start from the beginning: How about introducing yourself first?
What's up Dataiku! I am Ardalan, a French student in machine learning at the University of "Pierre et Marie Curie" (UPMC). I also graduated from a French Engineering school called "Arts et Métiers ParisTech". I'am currently interning at Fortuneo Bank where I develop predictive models to help marketers improving their mixed media planning.
Dataiku's Sponsorship
How did you first hear about Dataiku?
Dataiku came to UPMC to present its solution. In less than an hour, we were able to submit a model to a Kaggle challenge that scored top 20% (bike sharing demand).
What incited you to ask for Dataiku’s sponsorship for the contest?
As the contest is a two day hackathon, I felt like your product could speed up lots of pre-processing recurrent tasks and thus, save us precious hours. Moreover, I heard from Dataiku through Matthieu Scordia with whom the vibe was good.
How did Dataiku DSS help you reach a higher ranking?
Dataiku DSS is a tool that makes it easier to implement predictive models without coding it from scratch. It works with scikit-learn. As for the higher ranking, Dataiku DSS provides pre-processing tools that enabled us to clean and parse data in the blink of an eye. We used the drag and drop function to visualize the correlation between some features and we were able to select the right ones. It definitely lightened the pre-processing burden!
What was the best advice that Matthieu Scordia gave you during the contest?
Matthieu is an experienced data scientist and usually performs very well in such challenges. His best advice was to take good care of our cross validation methods. At the beginning of the challenge, there was a huge gap between our CV score and the LB score. I heard Matthieu's voice (like Yoda to Luke) whispering: "Careful folds selection you should take, the more reliable your CV score will be."
Ardalan, Dataiku DSS Ambassador
Have you used other similar data science tools before? How do they compare to Dataiku DSS?
I often use Dato library (formerly Graphlab) that does out of core learning and has cool visualization features too but it still needs some lines of code to achieve the pre-processing tasks, whereas Dataiku DSS can handle that without a single line of code.
What do you like most about Dataiku DSS? What are your favorite features in the tool?
What I like the most about Dataiku DSS is its friendly approach. The tool has a modern design, and it is still backed by the current state of the art machine learning library.
The coolest feature of Dataiku DSS is probably the workflow display as you can visualize the entire data pipeline. It enables collaboration work, each of the members can do their own personal pipeline, and at the end we can assemble the results really easily thanks to the Dataiku DSS workflow.
Would you recommend Dataiku DSS to other people? Why?
It is hard not to recommend Dataiku DSS because it addresses a really wide range of professionals: from the BI guys whose could use Dataiku DSS to analyze terabytes of data without a single line of code, to the experienced data scientist that could use it to speed up recurrent tasks and concentrate on recurrent neural networks, the choice is quickly made!
Many thanks Ardalan for the interview and for being a great Dataiku DSS ambassador.