In the following video series, Kurt Muehmel (Business Engineer at Dataiku) explains how to use Data Science Studio to develop indicators when working with healthcare spending data.
Healthcare spending is an area of intense focus for companies and businesses worldwide, and Data Science Studio (DSS) can allow these groups to efficiently make sense out of these mountains of data.
From connecting to the raw data, through preparing the data in DSS's visual interface, to the use of a few lines of Python, go ahead, drop whatever series you're watching, and binge watch the following 10 videos instead!
Before you start watching, we recommend that you download Data Science Studio's free Community Edition (mac, linux, docker, aws) and follow along. Ready? Let's get to it!
Part 1: Introduction
Healthcare spending data provides a good opportunity to explore the capabilities that DSS provides for developing indicators from messy data from a variety of sources. This video series will walk through the steps needed to get from the raw data to useful indicators.
Part 2: Connecting to Data
Data Science Studio allows you to connect to your data, wherever it is stored. In this video, discover the basics of connecting to data in DSS with an example to illustrate the process.
Part 3: Exploring Data
Data Science Studio provides an intuitive interface for exploring your data, regardless of the underlying storage technology. We like to say that DSS is data agnostic. This video shows a few quick steps to help you better understand your data.
Part 4: Meaning and Type
Data Science Studio provides powerful capabilities to detect data types (text, integer, etc...) and to infer the meaning of the data (gender, date, etc...). This video shows how to understand and manage these designations.
Part 5: Joining Data Source
Data Science Studio gives you the ability to rapidly join different sources of data, an essential step in working towards meaningful indicators. This video shows how you can accomplish such joins using DSS's intuitive visual interface.
Part 6: Custom Formulas
Data Science Studio includes more than 60 built-in data processing functions. Included in this list is the ability to construct custom formulas and then apply those formulas to your data, regardless of data quantity or storage technology. This video shows how to construct a custom formula to calculate total healthcare spending.
Part 7: Grouping Aggregation
Data Science Studio includes a powerful "Group By" recipe that allows you to easily aggregate your data around certain variables without having to write a line of code (though you can do that too, if you prefer). This video shows how to use the "Group By" recipe to create a condensed view of the data.
Part 8: Indicator Variable
Data Science Studio provides several ways to create indicator variables or "bins" for your data. This video shows how nested 'if' functions can be used to create indicator variables.
Part 9: Splits and Filters
Data Science Studio allows you to easily split and filter your data on a variety of criteria, allowing you to run quality checks and divide your data into more meaningful segments. This video shows how to use these features to check for data quality.
Part 10: Code & Python
Data Science Studio's intuitive visual interface is one of its greatest features. This intuitive interface is perfectly complemented by the ability to integrate code (SQL, Python, R, Hive, Pig, etc...) directly into your workflow as well. This video shows how to use the notebooking feature to draft the code and how to then integrate it into your project.
We hope you enjoyed these videos and that you are now ready to apply what you've learned here to whatever data and usage scenarios you may have. Til next time!