One of the most fundamental concepts to master when getting up to speed with machine learning basics is supervised vs. unsupervised learning. This blog post provides a brief rundown, visuals, and a few examples of unsupervised machine learning to take your ML knowledge to the next level.
What Is Unsupervised Machine Learning?
Supervised learning refers to using a set of input variables to predict the value of a labeled output variable. It requires labeled data (think of this like an answer key that the model can use to evaluate its performance). Conversely, unsupervised learning refers to inferring underlying patterns from an unlabeled dataset without any reference to labeled outcomes or predictions.
There are several methods of unsupervised learning, but clustering is far and away the most commonly used unsupervised learning technique. Clustering refers to the process of automatically grouping together data points with similar characteristics and assigning them to “clusters.”
To see a practical example of clustering in action, check out Clustering: How it Works (In Plain English!).
Unsupervised Machine Learning Use Cases
Some use cases for unsupervised learning — more specifically, clustering — include:
- Customer segmentation, or understanding different customer groups around which to build marketing or other business strategies.
- Genetics, for example clustering DNA patterns to analyze evolutionary biology.
- Recommender systems, which involve grouping together users with similar viewing patterns in order to recommend similar content.
- Anomaly detection, including fraud detection or detecting defective mechanical parts (i.e., predictive maintenance).
Unsupervised Learning and Clustering in Dataiku
Dataiku makes it easy to leverage machine learning technologies and get instant visual and statistical feedback on model performance. Learn more about clustering (unsupervised learning) in Dataiku.