Imagine strolling through the aisles of a supermarket, where you're greeted with an array of ready-to-eat goodies and ingredients for culinary adventures. Similarly, data products, akin to the products on those shelves, are the result of transforming data into consumable goods. But creating and sharing them isn't a piece of cake. What truly makes a data product exceptional? How do we cook them? And could there be a secret recipe to bring these products to life?
Why Organizations Need Data Products
The main challenge companies face is how to share the value of data without compromising it (while at the same time enriching it). Data products provide this answer, as they enable the value of data (whether in intermediate or finite form) to be distributed and easily circulated throughout the enterprise.
Corporate data and AI organizations are running against the clock here, but simultaneously, they are faced with a real dilemma. On the one hand, they must build and maintain an increasingly complex and difficult-to-secure data legacy, which requires the involvement of technical experts when it comes to bringing together scattered data sources or improving data reliability.
On the other hand, there is a growing need for business users to take ownership of data, even if they don't have the requisite skills to handle it. And as the perceived value of data grows, so does the need for simplified data solutions in compliance with pre-defined control and safety rules reusing predefined patterns. One doesn't fit into the other — and that's where data products come in.
The value lies in sharing: Data products enable organizations to extend the primary use of data, i.e., to create value for a wider audience than the initial one (i.e., technical experts), and ultimately to meet business needs to create data value without necessarily starting from scratch.
What Is a Data Product?
Translating the food and market analogy to the data realm, data products emerge as dynamic offerings with immense potential. Much like selecting consumable goods or gathering ingredients for personalized recipes, data products are made to be consumed and to cater to diverse needs in the data landscape. They unveil valuable insights, fuel innovation, and are essential building blocks for transformative solutions.
Data products can be broadly categorized into two main types, each serving distinct purposes:
Information of Today: This category includes data products such as datasets, dashboards, and business intelligence (BI) analytics. These tools provide current and historical information, enabling users to understand and analyze data in real-time to make informed decisions.
Insights for Tomorrow: AI data products take the spotlight in this category, encompassing models, applications, and business solutions. These innovative products leverage advanced AI techniques to generate predictive insights, enabling organizations to anticipate future trends and make proactive decisions.
What Are the Main Ingredients of Data Products?
Data products should be healthy, not dangerous, for organizations.
It's hard to imagine buying and eating processed products cooked with ingredients that have passed their sell-by date. The same applies to data products. Nobody would consume data products that are unhealthy. This will be all the truer if they do not meet regulatory or internal company requirements. In fact, why should they be available for consumption at all?
Data products must then be of good quality. The data it contains must be “fresh”: In other words, it must be current, accurate, complete, and consistent. It must also be refreshable too.
For example, a data product such as a predictive model of consumer behavior would provide erroneous in-store purchase intentions if it was deployed in a COVID-19 situation where stores were closed. Therefore, it is important that the data products developed always benefit from recent data. And if this is not the case, be alerted to a lack of freshness in the data sources feeding the models (more on data drift here).
In the same way that food scores are becoming widespread under pressure from consumers and regulatory bodies, every data product user would like to have legible information on the quality and respective scores of data products according to objective criteria.
Important! In the same way that greater confidence is placed in companies that can respect and guarantee the cold chain of food products, quality in this respect is not so much in the data product itself, but in the ability to maintain its quality at every stage of the product lifecycle from creation to delivery.
Data products should be easily discoverable and readily shareable, not siloed in isolated departments.
To be consumed, data products need to be identified, visible, and accessible. These notions are important, as data products are intended to be consumed by departments looking for data, analytics, and AI solutions. In this respect, sharing data products is a key point that needs to be integrated into the democratization strategy around data products.
Data products should be transparent, not opaque, and difficult to understand.
Transparency breeds trust, and trust leads to utilization. Just as consumers want to know the origins and ingredients of the products they consume, data products must be transparent by showcasing their traceability, processing stages, and a comprehensive list of their components.
How to Build and Deliver Great Data Products the Right Way?
Operationalizing Is Key:
Creating, developing, and promoting a data product that adheres to quality standards and compliance requires a significant resource investment. To ensure cost-effective scalability and agility, it is crucial to manage data products within a dedicated operationalization environment.
This will enable effortless reproducibility of multiple iterations of the same data product or several data products of a similar type. By implementing a reusable pattern and utilizing operationalization capabilities, you can seamlessly monitor the performance and health of your data product while effortlessly applying and calibrating multiple iterations.
Not Just for the Chef:
Furthermore, it is necessary to cultivate your products in an end-to-end environment where everyone can cook! Each restaurant should be able to rely on chefs and assistants who have specific tasks while working together to achieve high-quality dishes. The same principle applies to your data and analytics teams. They should have access to a collaborative environment to create data products independently, regardless of their initial level of expertise.
The Modern Data Kitchen: This data kitchen must also rely on your existing assets (you don't want to throw away your traditional and useful kitchen utensils). Still, it should also embrace modern stovetops, aka modern cloud infrastructure, to deliver on-time data products to your company's internal customers.
Of course, all of this must be done in a secure environment to avoid any incidents or failed recipes, as well as ensure the timely delivery of high-quality products under all circumstances.
Invest in Your Dining Room or Your Marketplace
In the same way that the dining room is as important as the kitchen in a restaurant, you must give equal importance to the place where your clients will consume your data products.
Whether in the form of ingredients such as plugins, datasets, or finished products, specific spaces can be created to make them accessible and consumable by as many people as possible. A company may therefore choose to make data products available in its marketplaces for internal use and consider external spaces for the most finished data products intended for customers:
At Dataiku, in our newly released Data Collections, you can find curated groups of datasets, view information about those datasets, and reuse them in your projects. From there, you can click on any dataset in a collection to view its details, status, and schema. From here, you can also explore, publish, export, watch, mark the dataset as a favorite, or preview its content.
For apps, dashboards, and ready-to-be-consumed apps, Workspaces is a streamlined access point through which stakeholders can easily find and access AI-driven insights and tools across all projects on a Dataiku instance. Making it easier for analytics teams to distribute their work to external audiences is key to integrating data and AI throughout an organization.
Data products should be healthy, discoverable, shareable, and trustworthy to delight your organization.
Being able to “cook” data products effortlessly from data is becoming pivotal for any data and analytics team to deploy the value created in all company departments. For this, having a single universal platform, the Data Kitchen, to cook data, AI, and BI data products is a prerequisite for delighting your internal or external customers. As customers become increasingly impatient and demanding, investing in reusable patterns and an operationalization environment becomes essential.
Finally, it's crucial to emphasize the importance of “maintaining the cold chain.” This means having a no-air gap unified data and AI environment and a complete value chain from creation to distribution, all without compromising the security and quality of the consumed data. By embracing this approach, organizations can ensure the reliability and integrity of their data products, enabling them to provide valuable insights and drive success for their stakeholders.