Today, a single innovation can put a market upside down in a couple of days. Most innovative products in the decade are data-driven and we are beginning to see the benefits of creating ‘data products’ in every industry (see the example of healthcare from the Dataiku blog)
Key Data Architecture Considerations
Tech companies have taken full advantage and mastered these technologies to widen the gap with others in terms of profitability.
Open source communities demonstrated their capability to gather resources to create and maintain softwares and solutions necessary to bring all of this ‘disruptive’ approach to any organization.
Startups today rely on these communities heavily to disrupt every industry.
However, when the time comes to provide infrastructure to support these technologies, decision-makers and enterprise architects face the risk of a negative impact when integrating new solutions in an existing environment while eventually trying maintain cost containment.
Let’s take the example of Hadoop. It makes sense to bet on technologies like this one. Not only does it allow distributed computing on any type of data, but, more importantly, it brings flexibility and commodity compared to existing Business Intelligence appliances. A company can start quickly with a small new infrastructure, reassign existing resources or even build their data lake in the cloud. And none of these options will get them stuck with a same vendor for ever, as is generally the case for Data Warehousing appliances. Even though Cloud may seem like an expensive option today, cloud providers are increasingly competitive so it’s a pretty safe bet for now.
This is crucial since you don’t want to find that your resources don’t allow you to scale up the day your chief data scientist runs a revolutionary algorithm that requires the cluster to be twice as large, and see the project postponed for a month because you need to renegotiate a contract with a vendor. You should always be prepared to rethink your architecture at any time.
Data Science Studio Integration
This is one of the things that comes back a lot when the clients I work with talk about Data Science Studio (DSS): the ability to create great predictive services based on data from Hadoop or from their existing DBMS, but also from simple log files. Indeed, DSS connects to about 35 differents types of data storage technologies, including Hadoop, Spark and more than 13 SQL and NoSQL Databases. This means it can be used by every type of organisation on any type of project. It’s so important to know that once you create a project, the tools you’re working with will allow you to scale up as much as you need!
Also, this is something that anyone responsible for infrastructure management or investment knows: your analytics department will expect you to support and integrate all the various technologies you use, as well as the technologies you could one day use!
Moreover, your infrastructure management can benefit from the data as well. From predictive maintenance to security audit, there are lots of use cases that can help you improve your existing tools or set up new ones. For instance, you can predict when you need more nodes for a web portal if you’re marketing campaign is more successful than expected, and create additional ones right before your applications slow down (bye bye auto scaling and welcome to predictive scaling). Or even predict the exact impact of a failure in one of your systems!
An example of a predictive maintenance worflow with DSS.