The music industry is, and has always been, a playground for change. Once upon a time, we spent more than a decade with our listening devices, records, cassettes, CDs, etc., before new technologies took over.
Paradigms shift much faster now that everything’s gone digital. Overnight, we’ve gone from buying bundled collections of songs (i.e., records) and listening to a radio disc jockey’s playlist, to buying and streaming whatever tracks we want whenever and wherever we want.
I’ve analyzed 60 years of Billboard music chart data to see how popular music has evolved as technology has continually changed the listening experience. And what I found might surprise you. No spoilers… you’ll have to wait (or, if you’re very curious, you can scroll down, skip the data process, and go straight to the conclusions).
But First, an Anecdote
It’s a Tuesday night in August on New York City’s east side. I’m settled in at my go-to cafe with the evening’s dark roast. In between the ebb and flow of after-hours coffee chatter, the most intriguing run of old-timey, tin-can reggae sounds ever curated is crackling down from the speakers. In neat, algorithmic succession, these once-impossible-to-discover Jamaican gems just keep coming. I note to self for the Nth time tonight, “Ask barista for playlist!”
To a 90s kid, this is pure digital magic. Back in the day, maybe we didn’t have to walk uphill to and from school in the snow like our parents supposedly did, but my friends and I really did spend countless hours of youth sifting through dusty vinyl record collections 30 years our senior trying to discover tunes a fraction as unique as the ones being served up overhead now.
"Back in the day, maybe we didn’t have to walk uphill to and from school in the snow like our parents supposedly did, but my friends and I really did spend countless hours of youth sifting through dusty vinyl record collections..."
By now, it’s old news that digitalization smashed the music album to pieces and empowered us to discover, buy, share, and listen to whatever songs we want, whenever we want, wherever we want, special thanks to folks at the iTunes and Spotifys of the world. But what patterns are we seeing over time in the music we love even as new technologies reinvent our listening experience?
As befits an inquiry of this nature, I went straight into the studio (Dataiku Data Science Studio, that is) and plugged into the Billboard charts. Over the past 60+ years, Billboard has become the de facto authority on U.S. music popularity - every week they publish the Hot 100 chart, a genre-agnostic ranking of the most popular songs in the United States across artists of all mainstream genres including country, rock, hip-hop, and R&B.
With a few dozen lines of Python code, I built a data set consisting of every Hot 100 music chart published by Billboard between 1958 – 2016. The dataset drew from over 3,000 charts and contained 304,000 data points with information on the chart date, the artist name, the song title, and the song rankings for a given week.
Armed with this amazing collection of data that spans all of the key social, political, economic and technological developments in post-war America, I got right to work, eager to see how the popular music scene has evolved through it all. I had some guesses as to what I might find in this study but was ready - hopeful even - for surprises.
"I got right to work, eager to see how the popular music scene has evolved through it all."
Specifically, I focused on:
- Trends in popular music content over time.
- Trends in the popularity of artists and songs over time.
For a closer look at it all, I set out to drill down into the timing of artist and song appearances on the Hot 100 Billboard charts over time and perform a content analysis on the titles and lyrics for songs that made the charts since the 1950s using a natural-language processing tool developed by my data scientist friend and colleague Alex Wolf.
From here, I laid down tracks for all the data cleansing, enrichment, and feature engineering steps in this project:
- To prepare the data for analysis, I converted the Billboard dataset to PostgreSQL to ensure data operations run in-database instead of in-memory.
Tip: Dataiku’s Visual Sync tool offers a simple way to do this conversion, and doing so can be a nice strategy for managing computational resources in case large files and/or complex procedures are part of a project flow - download Dataiku for free to give it a try or check out the documentation.
- I enriched the Hot 100 data by adding information relating to each song’s rankings for the previous week, its top ranking ever, and its total number of weeks on the charts. And finally, before changing gears, I addressed the fact that sometimes artists and songs show up in the data as different text strings (e.g., “Jay-Z” and “Jay Z”).
Tip: With a few simple clicks, I was able to do this visually using Dataiku’s analytics clustering processor.
- I also engineered a few new data features to test whether the songs tend to follow certain trajectories on the charts. For example, I create custom aggregations on the song’s chart positions over windows of time, such as the average chart position over the few weeks leading up to a chart date.
Tip: For this, I used Dataiku Visual Windows tool, which makes working with dates ridiculously easy.
Once the data was prepared, I began the analytical work; I built, cut, dissected and dug through all the Billboard data and pieced together some interesting artefacts from music history:
Conclusion 1: The King Still Reigns
Before kung fu and rhinestone-studded jumpsuits slipped into his repertoire, Elvis conquered the world of music playing a new species of American sounds called rock ’n’ roll. Sixty years after Elvis became Elvis and 40 years after his untimely passing, he’s still The King. The numbers don’t lie; Elvis had enough songs on the Billboard charts to fill up an entire chart by himself, and he still leads the pack for having the most songs on the charts ever (105+) as well as the largest number of chart appearances (975+).
The numbers don’t lie: Elvis still leads the pack for having the most songs (105+) on the charts ever.
But if we measure music royalty in terms of No. 1 hits on the Billboard Hot 100 charts and not by total appearances on the charts, then the crown moves from Memphis across the Atlantic to four lads who idolized Elvis in their early days before leading a British invasion and becoming the first truly global superstars. As you might’ve guessed, I’m referring to the The Beatles.
Between 1964 and 1970, John, Paul, George, and Ringo wrote more No. 1 hits than any other band in history. Considering that fewer than one in 10 artists that actually made the Billboard charts ever had a No. 1 hit, the fact that the Beatles were averaging more than three No. 1 hits per year and nearly 30 percent of their songs on the charts reached No. 1, is perhaps among the most impressive records in all of entertainment history.
Interestingly, although the Beatles and Elvis dominated the charts in terms of No. 1 hits and total number of appearances on the charts respectively, their songs tended to drop off the charts more quickly than the songs of other top artists who’ve also accumulated an astronomical number of chart appearances.
For example, while The Beatles’ and Elvis’ songs remained on the charts for nine weeks on average, songs from Madonna and Rihanna, for instance, (numbers three and six respectively for most number of Hot 100 chart appearances of all time) have remained on the charts for 15 to 20 weeks at a time on average.
While The Beatles' and Elvis' songs remained on the charts for nine weeks on average, songs from Madonna and Rihanna remained for 15-20 weeks at a time on average.
From this insight, I followed the analysis a bit further to see if my observations about song tenure were part of any larger, discernible trends in the data. That led me to my next finding.
Conclusion 2: One-Chart Wonders
Yes, there are one-hit wonders, but one-hit wonders are generally not one-chart wonders. In fact, the vast majority of songs that make the Billboard Hot 100 tend not to be one-chart wonders. Typically, they stay on the charts for weeks at a time, as we saw with Elvis, The Beatles, Madonna, and Rihanna.
With a bit of data crunching, I set up an analysis to examine whether song tenure on the charts has changed from one decade to the next. What I found is that ever since the 1960s, songs generally seem to be enjoying longer and longer runs on the charts. Here is the average tenure for songs on the Hot 100 charts throughout the decades:
- 1960s: 8 weeks
- 1980s: 12 weeks
- 2000s (through 2010): 15 weeks
With all the new options for listening to and discovering music, and the proliferation of recorded music, I expected to find that nowadays listeners seek more variety more often and therefore song tenure on the charts must be going down over time. But the analytic results simply do not tell this version of the story.
If songs truly are enjoying longer runs on the charts as the data indicate, I wondered if it could be the case that artists these days are somehow writing content that’s simply resonating with us for longer? I became curious to see what changes do the data show for our written content over time, and here’s what I found...
Conclusion 3: All You Need is Love
I ran all of the song titles appearing on the Hot 100 Billboards charts through a natural language processing tool built by my friend and colleague Alex Wolf (see his blog for more details) and analyzed the most frequently occurring key words.
In spite of all the changes we’ve seen in society, technology, and the music industry through the decades, no matter how much we think we’ve evolved beyond our parents’ and grandparents’ times, there’s been incredibly little variation in the lyrical themes that apparently have resonated with music fans.
By far the most common theme among hits across the decades is none other than LOVE.
But there’s reason for hope. By a long shot, the most common theme on the Billboards charts across the decades was…drum roll please….L-O-V-E. That’s right, for every single decade from the 1950s to the present day, love has been the winning theme for artists and fans alike. In good company with “Love”, were “Girl,” “Baby,” and “Time,” which have also shown up over and over, decade after decade, as top 10 key words in our favorite songs.
With a bit more time, I’d have a heyday digging into this data and pulling out lots of interesting pieces of trivia. Like how the Billboards in June of 1963 were dominated by a Japanese singer named Kyu Sakamoto, whose song Sukiyaki reached #1 on the charts and stayed there for weeks in spite of that fact the song was entirely in Japanese. Or how Paul Mauriat’s Love is Blue brought the French to the top of the charts for the first time in 1968.
Beyond unearthing interesting facts from the data, good work could be done on this project to extend the analysis into the realm of machine learning, particularly by doing clustering analytics and topic modeling on the song lyrics. So please keep your eye on our blog for future installments in this journey!
Also, if you're a data explorer like me and just stepping into the world of big data and predictive analytics, check out this guidebook, which you might find useful along the way. Or get the illustrated guide to machine learning basics to prepare for the next installment.