Our tagline at Dataiku since mid-2018 has been “Your Path to Enterprise AI,” which underscores the importance that every company has its own unique journey to take in order to leverage AI in the enterprise. Architecture is no exception to this rule, as Dataiku enables organizations to accelerate AI transformation wherever they are (such as on-premise, hybrid cloud, or full cloud). Today, we’re focusing on the cloud, breaking down how Dataiku makes cloud integration — with supported instances on Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure — easy.
According to a Dataiku survey of 200+ IT executives, 64% of organizations have a hybrid cloud approach when it comes to machine learning projects, meaning on-premises data centers and a private cloud are combined with one or more public cloud services. However, the process isn’t without its challenges — full maturity in this domain will come with the seamless combination of multiple cloud and on-premises solutions into a hybrid architecture.
Avoid Lock-In, Preserve Existing Projects
The move to cloud providers and the integration of open source technologies doesn’t seem likely to slow down anytime soon. Given this information, organizations need a platform such as Dataiku that decouples people, skills, and the projects they’ve built from the underlying technologies and infrastructure. That way, if the underlying technology changes (i.e., from Hadoop to Kubernetes) or the organization moves from one cloud provider to another, people can continue using their current skills to maintain existing projects and build new ones with minimal disruption. With Dataiku, organizations can:
- Move from one underlying technology or cloud provider with minimal impact on data and AI projects, which mitigates the risk of technological lock-in and strategic IT/cloud dependency
- Switch data sources and computation engines from one technology to another, which makes it easy to maintain projects as new options become available
- Run data and AI projects across platforms, enabling organizations to operate across on-prem installations and cloud infrastructure
Leave Inefficient Resource Consumption in the Past
Traditionally, data systems included the infrastructure for both computation and storage, which meant that both would need to be scaled simultaneously, even if only one dimension was resource constrained. The solution is to use decoupled compute and storage systems so that workloads can be scaled independently. Dataiku uses a pushdown architecture to allow organizations to take full advantage of distributed and highly scalable computing systems, including SQL engines, Spark, and Kubernetes, as well as optimized data paths to access the elastic storage systems.
This approach enables organizations to get their insights faster. Leveraging cloud elasticity enables teams to easily scale resources up and down to flex with their project needs. With Dataiku orchestrating those resources, users automatically have access to the right technology at the right time.
Embrace Usability, Collaboration, and Accelerated Time to Value
Dataiku helps enterprises quickly realize the value of cloud providers (AWS, GCP, and Microsoft Azure) by enabling teams to collaborate and adopt data science practices at scale. Dataiku’s end-to-end platform and cloud agnosticism makes it easy for organizations to use the cloud providers and their available services while simultaneously allowing users of all levels to quickly go from data exploration and preparation to fully built out AI applications, without siloing that work to exclusively technical experts.