Large enterprises today who are working to scale up their efforts in data science and machine learning to move closer toward the path of enterprise artificial intelligence (AI) generally hit one major roadblock: they don’t have the optimal tools and processes in place to easily expose models to final end customers. We’re proud to announce today Dataiku 4.3, which makes it much easier to operationalize model deployment with two major new features:
Heads Up!
This blog post is about an older version of Dataiku. See the release notes for the latest version.
- Self-service deployment of models at scale for data science teams using containers: Data scientists can now deploy a scoring REST API in a single click on top of Docker/Kubernetes.
- One-click elasticity on AWS with dynamic EMR clusters: Data teams can provision new EMR clusters according to their needs. This feature allows on-demand processing power for temporary intensive analytics tasks or for data pipelines requiring a lot of power for a limited amount of time.
Self-Service Deployment of Models Using Docker/Kubernetes
Data scientists are very well tooled when it comes to developing models using either visual machine learning or Python/R packages, but most organizations struggle when it comes to actually exposing those models to their final consumers.
Very often, deploying models for their operationalization requires recoding, test implementation, and DevOps skills. This additional work is time-consuming and error-prone, and it holds organizations back from being able to swiftly operationalize data products with each model deployment (and subsequent new version).
Dataiku 4.3 eases model deployment operationalization by allowing data scientists the ability to complete all of the items in the following process in just a few simple clicks:
- Create a version of a model, whether it is visual or using Python/R.
- Build an immutable, executable container with the model used for the scoring API.
- Create a REST endpoint that automatically accepts input parameters matching the model signature defined at train time.
- Save the built model docker container together with information like who built or deployed it.
- Deploy and start a specified number of model API replicas (either on-premise or cloud Kubernetes cluster), automatically load balanced.
The new model deployment capabilities also allow data scientists to:
- Keep track of or roll back deployed models.
- Bundle a model along with the tests to verify its deployment and scoring.
- Set up user permissions and a staged deployment workflow (e.g., from dev to test to production) in accordance with how your business and team operates.
One-Click Elasticity on AWS with Dynamic EMR Clusters
Today’s modern data organizations must process vast amounts of data across dynamically scalable infrastructures using distributed computation. By pushing computations to Hadoop clusters, Dataiku has already make this scalability easy, fast, and cost-effective.
And with its new capabilities, Dataiku 4.3 facilitates the management of the elasticity of EMR clusters. Now, you can easily:
- Launch an Amazon EMR cluster from the Dataiku interface in minutes. You don’t need to worry about node provisioning, cluster setup, Hadoop configuration, or cluster tuning. Dataiku together with Amazon EMR takes care of these tasks so you can focus on analysis.
- Request power only when you need it and stop being dependant of your IT folks. You can now provision one, hundreds, or thousands of compute instances to process data at any scale that will become immediately available for your data team. You can easily increase or decrease the number of instances manually or with Auto Scaling, and you only pay for what you use.
- Connect as many clusters as you want to a single Dataiku instance to separate runtime from development and production.
One Step Closer: Operationalization of AI in the Enterprise
With this new capability, Dataiku achieves the core principles for effective operationalization of AI in the enterprise:
- Data scientists are now able to deploy models without the involvement of DataOps or IT people. And for security, admins can set which data scientists have the rights to use which infrastructure in which stage of deployment.
- With instant visibility into what is deployed and where, deployment status, infrastructure health, and monitoring available at a glance, the organization has a 360° view of its data operations.
- Being able to quickly deploy or rollback a version of a model in response to changing data or business needs (and for anyone on the team with the appropriate permissions to be able to do so).