How to use Python, Pandas and Scikit-learn for Kaggle challenges

Data Preparation| data science| Technology| machine learning | | PierreG

There is still a bit less than one month to compete in the Caterpillar Tube Pricing Kaggle challenge. In this competition, players are asked to infer the price of tubes from different suppliers.

Since the data is rather small and in a simple format, this challenge is a perfect way to start using python and its two most used packages: pandas and scikit-learn.

Join us at the next New York Big Data Workshop!

Meetup Dataiku

Henri Dwyer (Data Scientist at Dataiku) and myself will animate the next New York Big Data Workshop about this on 14th of August. Feel free to join here.

Start to train on iPython Notebook

We created for you a iPython notebook example on how to load files, reshape data and create your first model to be able to submit on Kaggle. You can also download the complete iPython Notebook file here.

    NotebookiPython Notebook example

Any question about this blog post? Just send me an email and we’ll discuss it :)

Other Content You May Like