We recently shared how we at Dataiku approach and facilitate data analytics in production, and we have one more angle to take on it: the human angle.
The greatest challenges to deploying data science models into production are often organizational, not technical. This is because the IT team often has some very good reasons why they should own and manage these models; the first reason is accountability, because if something fails, they’ll be blamed, whether or not the flaw comes directly from them. And so this means that successful deployment usually requires a level of collaboration between the analytics team and the IT team that can be a struggle to achieve. Before deploying a project, the analytics team should sit down with the IT team to explain the model and its goals.
A Best Practice for Deploying to Production
For us at Dataiku, the common best practice is to install a Dataiku instance on your production environment and transfer bundled projects from the design environment to that production environment. In the design environment, everything, including the pipelines and even the datasets, are recreated, so that if anything goes wrong, it does not impact the production environment whatsoever.
Developing a Healthy Perspective on Failure
But beyond the technical issues, it is important that your organization develops a healthy perspective on failure. In particular, the design environment is a place designed for frequent (and sometimes spectacular) failure. In the design environment, your analysts and data scientists should have no hesitation pushing code that could bring the whole pipeline down, because Dataiku DSS makes it simple to revert to an earlier, functioning version. In production, each organization’s needs will be different, but the ability to monitor, create scenarios, be notified, and roll back if necessary are features that should be valuable to everyone.