Data Quality: The Secret Sauce of Data Chefs

Data Basics Jean-Guillaume Appert

In this blog, David Talaga (Product Marketing at Dataiku) explains that shopping in a supermarket could be similar to searching for the best data product for your use case. But when it comes to selecting the ingredients, what makes you choose one ingredient versus another? Why do you think organic vegetables have a different taste than others? As a data analyst, the same mechanism applies to the data products you are looking for: Can I trust it? Where does it come from?

chef at a restaurant

Why Data Products Need Data Quality

Whether you are a Michelin Star chef or a home cook, you have basic questions about the quality of products. Let’s take an example:

Kraft mac and cheese label

  • Information: What is it? It seems to be macaroni & cheese (a simple description).
  • Trust: Do I trust the provider? Kraft is a known brand with no particular food scandals known. 
  • Freshness: Can I consume it now? Does it need to wait? It looks ready to be cooked.
  • Content: What is it made of? Is it certified? There is a full list of ingredients and nutrition facts about it.
  • Origin: Where does it come from? It does not seem imported, so I would say it has been processed in the U.S.

For the data products, the basics of quality are not different:

  • Purpose: What is the goal of this data product? 
  • Trust: Who is the steward of the product? Has it been certified against GDPR rules? 
  • Freshness: Has it been built and run recently? The last update of the data product will indicate its quality. 
  • Content: What are the key components (table, dashboard, model) used to build it? 
  • Origin: Where does the data to train the model come from? Where does the data used for analytics come from? 

We can consider those as basic information that you generally have around data products, but you can go to the next level of quality for your data product such as:

  •  Delivering a tutorial to reuse it
  • Describing the use cases leveraging this data product
  • Being certified by a third party 

As a result, the success of a data product will depend on its quality, which relies on basic information and enriched metadata.

What’s Needed to Produce a Michelin Star Data Product? 

Building a Michelin Star data product or a data product that will be consumed in a Michelin Star restaurant must be easy: Focus on fulfilling the requirements for this use case only. You never know when the Michelin gourmet guide inspector is coming so better focus on the repeatability of your high-quality experience for the best chance of success.

Delivery of a data product requires finding the balance between the high-quality niche use case and all the other use cases happening in your company. Here are the keys to building a successful data product:

  • Define, describe, and monitor its quality: The steward and the consumer must know the level of  quality expected and the result of the latest tests. 
  • Document the data product:
    • The reason why it has been built 
    • The content of it and the origin of the data that has been used 
    • The use cases where it has already been used 
    • Any relevant information that can help a consumer to leverage it 
  • Publish and share it: A challenge of a data product is related to its discovery. There are too many datasets, dashboards, or reports available by default with an unknown status. The publication process requires defining the ownership of the data product and the structure or means to make them discoverable. 

Finally, all of this must be repeatable and repeated. It makes the difference between a project and a product. A product has a lifecycle and some updates. Even alcohol-free wine exists, so the product must be built to evolve.  

How Can Dataiku Help? 

Dataiku is the platform for Everyday AI. It allows users to deliver data products with different outcomes: a dataset, a dashboard, a model, insights, and so on.

As a platform, it applies the same principles to all those interfaces:

  • Data quality rules integrated into the flow of your data product
  • Operationalization of these data quality rules
  • Monitoring of this data quality to ensure the constant value of the data product.

Putting All the Ingredients Together

In conclusion, cooking a Michelin Star meal and building a successful data product are similar. They both require:

  • A chef/data steward who knows the requirements for the data product and verifies the result once it is ready. 
  • A squad of data analysts and data scientists that collaborate on the same platform to build the data product. 
  • A kitchen/the platform where all the services are available to build data products and operationalize the result of it.

Since its origin, Dataiku has provided the platform for data analysts and data scientists to work on the same project. A constant enrichment of the transformations and models available have helped them build complex data products. Finally, data quality, operationalization, and reusability are the success factors for building valuable data products.