If there’s one thing that unites all of us during the festive holiday season, it’s music. Holiday tunes resonate with everyone, and whether you’re a fan of the old classics or the new hits, who can resist humming (or dancing) along?
So this year, we knew we wanted to take a look at the data behind the December ditties (ahem, or November. Or October.) we all know and love. What did we learn? Well, not much about Christmas, actually - as expected, the positivity of the season is reflected through jolly beats and the use of words like love, light, Santa, sing, make, and say.
But we did learn a lot about data science. For example:
- Starting such a broad exploration piece taught me to better define goals. In data science and in life for that matter - clear goals always help.
- We experimented and played around with data, which is cool, but - reusability is awesome! Now that we did this analysis once, we can reuse it for Easter, Halloween, or anything else by just changing the playlist IDs.
- Retrieving random stuff to use for analysis from APIs is way, way easier when leveraging Dataiku plugins. You can find the code for the Spotify plugin used for this project here.
Lots more insights ahead, though - here are the details of the holiday music analysis.
What better place to start for music analysis than Spotify? With no clear goal in mind but the vaguest of ideas (yes, I know this goes against most data science principles, but sometimes it pays off just to mess around a little), we collated songs from the four most popular Christmas Spotify playlists as well as manually selected the top five most streamed songs as of December 14th.
But overall, I’m not terribly displeased with that, as I’m sure most of us had to make more sense of more complex issues with worse data. So now that we have our dataset, I decided I was interested in analyzing the following:
- Word frequencies
- Lyric qualities obtained through a common Python library (VADER Sentiment)
- Spotify’s audio features as provided by the API
Top used words in our dataset included (unsurprisingly): Christmas, Santa, merry, etc.
However, there are some interesting outliers: la, want, run, da are all frequent on a dataset level, but they appear in a low number of distinct songs. “Fa la la la la” rings a bell for an explication on some of them (highly repetitive verses in very specific songs); however, “want” and “run” seem a bit off. It might show that some artists use words in holiday songs that reflect rushing or desire, opposite of the calmness and sharing of the season.
We can also observe song outliers in terms of songs using an extremely diversified vocabulary. Adding sentiment scores per song to our word token analysis, we can reinforce the point we were trying to make earlier: Christmas positivity is reflected through words like love, light, santa, sing, make, and say.
Focusing more on our song features as well as the sentiment score offered by VADER, I looked at the top and bottom 10 songs (based on VADER compound score).
The majority have quite a medium-low “liveness” score, suggesting studio rather than live recordings. The tone is rather upbeat in most (based on the size in our scatterplot), with lower valance in our bottom 10. Energy and danceability also appear to be lower in the bottom 10 compared to the top 10, which tend to score higher in at least one of the two.
I also conducted anomaly detection via an Isolation Forest to see if I could spot any outlying songs (and if so, how these outliers match up to the word frequencies we previously discovered).
Unfortunately, the results weren’t great because the underlying data is not enough for it to work properly, and we would have to add a lot more Christmas songs in order to have something working well.
I took a look anyway at the five songs that were tagged as anomalies to observe whether or we see any new patterns in their lyrics. The only noticeable fact is that there are a lot of onomatopoeia exclusive to these songs, which are not necessarily a bad thing, it just makes them that extra bit special.
So that’s it! I did my best in understanding the little we had to work with on this occasion, understood why some of our bits failed, and figured out what I can improve in the new year.
If you’re always looking to improve your data science skills in the new year, check out tips from expert data scientists in How to Be a(n Even Better!) Data Scientist in 2018. Or if you’re just getting started, you might be interested in this Machine Learning Basics guide for beginners.