Building robust and transparent predictive models is critical to trust and adoption from the business — but it’s notoriously time consuming to ensure models meet best practices standards, work on both "easy" and edge cases, and fulfill many other quality criteria. There are no shortcuts, as quality breaches in AI are what news headlines are made of. Dataiku 9 makes designing robust models faster, thanks to improvements and new capabilities throughout the model development lifecycle.
Step 1: Designing Robust Models
Novice and advanced data practitioners alike can make modeling mistakes of various shapes and sizes that can lead to poor quality models or just painful rework. Mistakes can range from using too small an input dataset and imbalanced target distribution or even selecting erroneous model hyperparameters. All of these mistakes, unchecked, can lead to too-good-to-be-true metrics and costly training misconfigurations. Dataiku 9 provides several quality-enhancing guardrails and sanity checks to produce robust models and reduce rework.
Model Diagnostics: Dataiku 9 provides built-in checks to develop quality models with visual ML diagnostics to troubleshoot/debug models. Think of it as a second set of eyes checking your model and providing warnings and hints along the way to improve your model throughout its development. Best of all, when issues arise, Dataiku displays messages that explain the problems so that novice data scientists can learn from their mistakes and take corrective action.
Model Assertions: Most of the time, the data scientists are not business domain experts. A model developed without expert input could fail basic tests which can then lead to a failed project and a lack of trust with key stakeholders. Dataiku 9 includes model assertions to check a model correctly handles known use cases to enhance communication between domain experts and data scientists. Even better, model assertions continue to run during retraining in production, giving the production team another check on model health with current production data and conditions.
What-If Analysis: Data scientists and domain experts alike can benefit from testing predictive outputs based on varying input conditions. Dataiku 9 includes what-if analysis designed for data scientists and business users. Data scientists can use what-if analysis to gain a better understanding of modeling results using row-level data. Data scientists can also publish what-if analysis to Dataiku dashboards where business users can test models and build trust in results using actual or hypothetical data.
Model Fairness Report: Dataiku already includes Responsible AI features, including subpopulation analysis and partial dependence plots. Dataiku 9 adds a new model fairness report with the most common metrics, including demographic parity, equality of opportunity, equalized odds, and more. Having access to additional metrics can help novice and advanced data scientists determine if their model is biased against a particular group or fails to make accurate predictions for a group.
Step 2: Tune Faster Without Sacrificing Quality
The goal is always to build a high-quality model, but rounds of manual experimentation and tuning can drag on and test the patience of even the most experienced data scientist. Additionally, suboptimal choices are not obvious, such as picking the wrong tree depth. Hyperparameter search automates finding the optimal parameters for a model but can be resource-intensive. In Dataiku 9, hyperparameter search gets a boost from the distribution of the parameter search workload on Kubernetes for faster model tuning.
Dataiku 9 supports data teams and data scientists from novice to expert with new capabilities that help build better models, explain models, and build trust with business stakeholders, faster than ever before.