Working With Dataiku Data Science Studio in a Data Science Challenge

data science| machine learning| tutorial | | Pauline Brown

Are you interested in participating in a data challenge? Find out how Dataiku Data Science Studio can help you with this example from the AXA and datascience.net's latest challenge.

axa0.png

The Challenge

To start off 2015 with a fun data challenge, AXA and datascience.net launched "Building a cross-selling affinity score for an insurance product during a telemarketing campaign." The contest, which is composed of two distinct phases, challenged participants to score insurance products for cross-sales. The prizes rangeg from €5000 (first place) to €500 (6th place).

  • At the end of the first phase, a quantitative metric (minimum of 10% lift) will determined the 6 best contributors.
  • In the second phase (march 9, 2015 - march 27, 2015), the 6 participants replicated their project in Dataiku's Data Science Studio. Based on these results, a jury composed of datascience.net experts and of an AXA judge classified the finalists and delivered prizes accordingly.

In this blog post, you’ll find a few tips and tricks on how they properly used Dataiku Data Science Studio for the challenge. Even if it's over now, you can still play around with the project!

(Sorry, these are old screenshots of Dataiku DSS, find out what the new version looks like here)


Let's get started!

First, download and install the DSS Community Edition.

Second, download the AXA project by clicking here. When you've downloaded the file, please import it as follows:

AXA Datascience

Now, enter the Data Science Studio flow:

AXA Datascience

And here is a little recap of the data you see above in "datasets source":

AXA Datascience

The challenge is scored according to lift. The challenge only scores the top 10% of your highest probabilities (see below):

AXA Datascience

Project Example in DSS

In the DSS project, here is an example of how you can build your own model:

AXA Datascience

In the model bench, you can try different algorithms and compare them to each other. In this example, we are testing Logistic Regression (L1 penalty, C=0.15), Logistic Regression (L2 penalty, C=0.15), and a Random Forest:

AXA Datascience

Great! When your model is done and when you're sure your project is first prize material, you can submit it on datascience.net. How? Simply put your model in the flow or customize it in a python notebook:

AXA Datascience

Now, go to the Export Center in your Data Science Studio and submit online:

AXA Datascience

If you have any questions, please get in touch with us! Otherwise, good luck!

And don't forget to...

Dataiku Production Survey Report

Other Content You May Like