O16n On The House: Housing Predictions Made Easy

data science| machine learning| operationalization | | Megan Fang

A big problem in machine learning (ML) is data scarcity, but operationalizing ML is an even bigger problem for enterprises (and is often overshadowed by the excitement for big data). Dataiku 4.3 makes operationalization easy, and one user has already leveraged it to predict housing prices.


Increasing Importance of Operationalization

Operationalization (o16n) is the time-consuming process of putting a model into use. It involves recoding, testing, and coordination across different groups of the organization, many of whom often don’t have easy access to the model and data.

Screen Shot 2018-06-25 at 1.26.22 PMOperationalization is the key step to extracting business value from ML, but only 23% of respondents felt that their company was successful at operationalizing and making data-driven decisions, though 98% felt that their company was devoted to being data-driven. In this era of big data-- where ML models are constantly being rebuilt to fit new data-- fast operationalization is increasingly important.

Making Operationalization Easier: Dataiku 4.3 New Features

Dataiku recently added new features to make o16n fast and easy:

  • Self-service deployment of models at scale using containers: Data scientists can now deploy a scoring REST API in one click on top of Docker/Kubernetes. (Containers are a way of packaging software that means the software will run in the same fashion across different environments. Docker helps you create and deploy containers, while Kubernetes helps you manage them).


  • One-click elasticity on AWS with EMR clusters: EMR clusters can boost or reduce resource use depending on demand. Dataikuers can now leverage EMR clusters to run temporary intensive analytics or data pipelines that temporarily require a lot of power.

Example Use Case: Predicting House Prices

As mentioned, one user has already used Dataiku 4.3 to predict housing prices using features such as location, number of rooms, and surface area. Lam’s end goal is to make this predictive model available to others (operationalize it), and he can now do this in a few clicks with Dataiku’s new features. If you're in the Amsterdam area, come to the meetup where Lam will discuss how he built and executed this project. Or download the latest version of Dataiku and try a project of your own!


Other Content You May Like