Get Started

Deploy and Manage Dataiku in the AWS Cloud

Dataiku Product, Tech Blog Timothy Law, Xavier Thierry

The world of cloud AI is evolving rapidly, with new offerings and capabilities available seemingly every day. This state of flux creates a challenge for analytics and business leaders as they look for the best cloud and AI capabilities to meet the growing needs of both business users and data science teams. For architects, the challenge is to select the technology that will meet the company's needs while taking advantage of the cloud's ease of deployment, cost, and scalability benefits.

The Dataiku cloud stack accelerator for AWS delivers a complete set of AI capabilities for business, analytics, and data science teams that takes full advantage of the cloud infrastructure and services of AWS. With the new accelerator, cloud architects and administrators can automate the deployment, configuration, and management of Dataiku's Everyday AI platform. The template-driven, clickable interface makes it easy for administrators to control and manage the deployment of elastic cloud AI for new teams and to maintain and upgrade existing groups.

With the Dataiku cloud stack accelerator, your team can be up and running with Dataiku on AWS in three easy steps.

Step 1: Networking, VPC, and Security

The Dataiku cloud stacks accelerator uses native technology from your existing AWS virtual private cloud (VPC) topology — leveraging the security components that help isolate the infrastructure through the use of subnets, security groups, and routing controls. 

Before deploying Dataiku cloud stacks templates, cloud architects create permissions to create/delete/run Dataiku instances, VPC permissions to create security groups, and identity access management (IAM) permissions to pass the Dataiku role to Dataiku instances. At this step, you can also easily set up cluster permissions if elastic compute clusters will be required for your data science projects.

AWS Cloud

Cloud stacks templates are launched from within the AWS CloudFormation console and managed entirely within your AWS account, so you are already operating in your secure cloud environment. You enter the link to the fleet manager template into your AWS CloudFormation service. The link can be retrieved here.

cloudformation service

Once you specify the Dataiku template, you can configure your stack options, defining where the cloud resources get created and how to access the output resources securely. Only administrators with appropriately assigned roles and permissions can modify the stack and all changes are logged and auditable for complete oversight and control. 

configure stack options

Step 2: Deploy Dataiku for AWS

Dataiku has developed four out-of-the-box deployment templates for users to deploy Dataiku with everything required to start using the platform, build and develop AI and analytics projects, and productionize them in their AWS cloud. These templates are customizable to the requirements of your organization or line of business. 

deploy Dataiku for AWS

These deployment templates offer various architectural blueprints, from single-node design environments for building data pipelines and models to elastic environments for small and midsize data science teams that require elastic resources like Kubernetes clusters. 

For a full blueprint, use the Deploy Full Fleet option. This template deploys a complete, enterprise-ready elastic AI stack with the ability to provision, manage, and scale elastic AI compute clusters including connections to managed services like S3 storage, Redshift, and Kubernetes to support tasks such as data preparation, model training, and deploying to production, all running inside your secure AWS cloud on your EC2 virtual machine. 

Now that you have deployed Dataiku, your data science teams can connect to and utilize additional AWS managed services such as Athena and Glue or AI services for NLP or vision, such as Amazon Comprehend or Rekognition. 

Dataiku AI application

Step 3: Maintain, Expand, and Upgrade Dataiku for AWS

With Dataiku on AWS, IT operators or administrators can easily manage day-to-day tasks including onboarding additional users, adding groups, and deploying new Dataiku instances. And because the templates are easy to use, IT can maintain control if desired, or delegate platform administration to a Dataiku administrator to reduce administrative overhead. Administrators can easily monitor Dataiku instances through a centralized visual interface. Templates also make it easier to upgrade to future versions of Dataiku. Dataiku pushes each release as an image so they are publicly available within your AWS account. Simply select the new version from the drop down menu to upgrade.

elastic fleet

Finally, you can also use the visual interface to set your recovery point objectives and snapshot frequency for disaster recovery. Dataiku leverages EBS Snapshots for recovery capabilities to ensure you can easily recover and redeploy to avoid loss of data and AI projects. With the new Dataiku cloud stack accelerator for AWS, analytics leaders, cloud architects and administrators gain the ability to control and manage AI on the cloud for business teams, data science teams, and IT operations, all while getting the most out of existing investments in AWS services and infrastructure.

You May Also Like

Scaling AI Safely — What Role Does a Data Scientist Play?

Read More

Graph Neural Networks: Link Prediction (Part II)

Read More

Observability: Moving Beyond Model Monitoring

Read More

Top Data Preparation Software Features in Dataiku

Read More