Building a Feature Store for Quicker and More Accurate ML Models

Dataiku Product, Tech Blog Joy Looney

The feature store is an emergent topic in data science and many data scientists and others in the machine learning (ML) space seek further understanding of this idea of developing a repository for reliable and impactful features to streamline ML model building. In this recent Dataiku Product Days session, Marlan Croiser (Senior Data Scientist from Premera Blue Cross, a leading health plan provider in the Pacific Northwest) walked us through Premera’s feature store development journey with Dataiku, providing more clarity on the purpose and value of building a feature store. This blog rehashes the helpful insights from the session. 

→ Check Out the Full Session

Introducing Dataiku at Premera 

Premera serves approximately two million people, from individual members to Fortune 500 companies, providing comprehensive health benefits and tailored services. Almost four years ago, Premera introduced Dataiku into their ML projects which support their extensive offerings. Premera now has 25 active users who each utilize Dataiku to develop and deploy ML projects. Additionally, Dataiku is increasingly being used for analytics automation across the organization for various data preparation and processing purposes. 

 Dataiku has been a boon for productivity, collaboration, and push to production.”

- Marlan Croiser, Senior Data Scientist, Premera Blue Cross 

Covering the Basics — What Is a Feature Store?

A feature store, in simple terms, is a pre-constructed set of features that is designed to be used by multiple ML models. Quick note: Features in this circumstance refer to the input variables used to train models. 

What Do Feature Stores Offer?

Speaking broadly, feature stores introduce reusability and speed into ML processes. Once features are developed, they can be shared across multiple models for ideal efficiency, and models can be developed in minutes as opposed to months. In the case of a regulated industry such as insurance, the feature store is particularly beneficial as it allows data scientists to use pre-approved features in new models without the chore of additional approval. This allows for the rapid development of new ML models. With reclaimed time, there is more flexibility to work on domain-specific features and, as a result, the number and quality of models typically increase when working with feature stores. Another advantage of utilizing feature stores is that they reduce the risk of model degradation issues (i.e., leakage, train/serve skew, etc.). 

model with feature store

Building the Feature Store

Premera’s features are defined using SQL scripts and stored in SQL tables, and all

features are specific to members (people who buy the health insurance individually or individuals at a company purchasing insurance). Premera has decided to group the features by source and designate subject areas which allow features to be placed in corresponding, easily navigated tables that are updated on a daily basis. 

At Premera, the feature store has been organized around four main requirements: defining features, automating features, ensuring accuracy, and ensuring availability. Each requirement is met with specific Dataiku functionality. SQL recipes are used to define core features, and scenarios are used to automate updates. Accuracy is ensured with metrics and checks. Production processes run on an automation instance while development work is done on a design instance. 

dataiku flow

Figuring Out What’s in the Feature Store

Leveraging Dataiku’s webapp feature, Premera has developed the Discovery Feature Explorer which enables users to explore various elements of the feature store, apply assorted filters, and easily access feature definitions. 

dataiku webapp feature

Using the Feature Store

Utilizing the plugin functionality of Dataiku, Premera uses a macro to generate SQL scripts to be used in designated SQL recipes that then can be edited by subject matter experts on an ad hoc basis. This works particularly well for coders as a point-and-click option and reduces the volume and complexity of join specifications and script edits

Overview of Premera’s Feature Store Experience

Since introducing the Dataiku platform and crafting their feature store, which is now used in virtually every ML project at the organization, Premera has seen a dramatic decrease in model deployment time alongside a notable increase in the reliability and standard of model performance. Building a feature store has allowed Premera to see a greater return on investment from data science and significantly greater effectiveness of ML projects which sustain their competitive health plan offerings — looking to the future, a native feature store will empower all Dataiku customers to realize these types of productivity gains, ensure consistency in feature engineering processes, and avoid duplicated work.

You May Also Like

An End-to-End Solution for Actuaries With Dataiku

Read More

Top Data Preparation Software Features in Dataiku

Read More

Data Lineage: The Key to Impact and Root Cause Analysis

Read More