Dataiku at the Hadoop Summit

On April 2-3 took place the Hadoop Summmit Europe, a two-day event about the Apache Hadoop community, in Amsterdam. I gave a talk about “Semi-Supervised learning applied to understanding customer...

Hadoop, Corporate, Events | April 14, 2014 | Florian Douetteau

Coming Back from Big Data Paris

The third edition of Big Data Paris summit took place on April 1st and 2nd 2014. Big Data Paris is the major French event about the big data ecosystem. Many important companies and talented people...

Product, Corporate, Events | April 04, 2014 | Jeremy Greze

Winning Kaggle: An introduction to Re-Ranking

Our regular readers are probably familiar with Kaggle and its machine learning contests. If you're new to Kaggle  you can read this article , "A Kaggle Data Science Competition Made Easy"  to get...

recommendation, data science, machine learning | January 14, 2014 | Paul-Henri Hincelin

Beyond the Hype: the 6 Core Skills of a Good Data Scientist

The term “data scientist” was coined in 2008 by two LinkedIn analysts to describe their work deriving business value from the masses of data being generated by their website. Since then, people...

Opinion, Data Science Basics, business | November 10, 2013 | Florian Douetteau

Machine Learning for Merchandising

The other day, I was talking to my friend who runs a personal e-commerce site selling speciality items. He wanted to understand how machine learning technologies could help him manage his site...

business, machine learning, predictive analytics | October 24, 2013 | Florian Douetteau

Dataiku at Berlin Buzzwords 2013, Part 1

I had the chance to attend and speak at this year's Berlin Buzzwords conference last week, dedicated to the topics of search, scale and storage.

Corporate, Technology, Events | June 13, 2013 | Clément Stenac

Berlin Buzzwords Part 2: Introducing Dataiku Flow and dctc

As I mentioned in my previous post, I also had a chance to talk about what we've been up to over here at Dataiku during my stay in Berlin:

  • Dataiku Flow, the next-generation data pipeline...
Events | June 13, 2013 | Clement

The New Search : Fuzzy, Instantaneous, and Local

At Dataiku, we use extensively search logs and associated navigation information for user behaviour analytics and relevance optimization. Most of our customers today use SOLR or ElasticSearch....

data science, Technology | May 03, 2013 | Florian Douetteau

A Complete Guide to Writing Hive UDF

Note that this guide is quite old (it was written when Hive was at version 0.10) and might not apply as-is to recent Hive releases. Use at your own risks :)

Dataiku DSS provides deep integration...

Hadoop, data science, Technology | May 01, 2013 | Clement

Kaggle Contest: Blue Book For Bulldozers

Perhaps you know Kaggle and its slogan “making data science a sport”?

Kaggle is a cool platform for predictive modeling competitions where the best data scientists face each other, all trying to...

data science, machine learning, python | April 26, 2013 | Matt Scordia

Thomas at Strata - Part 2

The previous post on my trip to Strata describes my first day there. You may want to read it here.

The next two days were focused on keynotes and presentations, as well as exhibitors products...

Corporate, Events, strata | March 21, 2013 | Thomas Cabrol

Thomas at Strata - Part 1

I've been lucky enough to travel to Santa Clara, California, and attend the Strata Conf event. I was there for two days and have plenty of insight and feedback on all the sessions over the course...

Corporate, Events | March 12, 2013 | Thomas Cabrol

Visualizing Your LinkedIn Graph Using Gephi - Part 1

Graph analysis becomes a key component of data science. A lot of things can be modeled as graphs, but social networks are really one of the most obvious examples.

In this post, I am going to show how...

Technology | December 17, 2012 | Thomas Cabrol

Visualizing your LinkedIn Graph Using Gephi - Part 2

In the previous post, we learnt how to get data out of LinkedIn via its API. This task is quite technical but an entire component of every data science projects: accessing and manipulating data from...

Data Visualization, Technology | December 07, 2012 | thomas

Setting Up A Cool Data Science Platform Cheaply

Current technologies allow us to build a data science stack for very little, and it will perform as well or even better than stuff that used to cost a lot a few years ago.

stack, data science, tutorial | October 03, 2012 | Thomas Cabrol

A Simple Recommendation Engine Implemented in Different Languages

Ever wondered how you get recommended to watch Raiders of the Lost Ark after you gave a good rating to Star Wars on your favorite movie rental service ? (yes, back to the 80's...). That's the...

recommendation, data science, Technology | September 10, 2012 | Thomas Cabrol

Visualizing French Income Tax Data

What was supposed to be a simple data visualization side project with some French open data ( and Tableau Public ended up in something quite complex.

Data Visualization, Data Preparation, Data analysis | July 01, 2012 | Thomas Cabrol
