Summer is Coming - Game of Thrones Analytics

Data Visualization| data science| Technology| machine learning | | Maxime

Spoiler alert: This post contains the full list of dead Game of Thrones characters... including those that will most likely die in season 6.

What would you say if I told you that I can use machine learning to predict character deaths in the Game of Thrones season to come (season 6)? You'd probably say: NOOOOO, don't spoil this for us!

You know nothing
Predict the Future Death of GOT Characters?

And you're right! There is absolutely no way I'm going to spoil season 6 of Game of Thrones - for my own sake and for your own. So instead of trying to predict who is going to die in season 6, I decided to have a look at the very last words of characters that have died throughout all the seasons and to see what would come out of this. Were they angry, disgusted, happy (that would be weird...), confused...?

What Data are we Talking About?

I used the website genius to retrieve the scripts of the episodes of Game of Thrones for the first and the last season. I also used Wikiquote's Game of Thrones' page to retrieve the exhaustive list of characters that have died since the serie started. Then I pulled all of this data into one dataset:

Explore view of the dataset
Dataset Containing the Scripts of the Episodes

DSS is pretty awesome to work with text data, including movie scripts. Indeed I've used several processors of the preparation script to clean the data. The most important processors I used are:

  • Regular expressions to remove all the text that was between brackets or parenthesis (it was actually informations about the scene);
  • 'Simplify text' to normalize the text, stem words, and remove stop words.

Prepare recipe
Prepare Recipe for Text Mining

The last action I took was to extract the last sentence pronounced by all of the characters before their deaths.

Now if you know Game of Thrones inside and out (like I do), you should be able to answer this simple question: to which character do these last words belong to?


You will not hear me scream!


Can't find the answer? Take a look at the web app at the end of this post and find out.

The big thing about Game of Thrones is that deaths are always unexpected (think about the Red Wedding... nobody expected that to happen). But I was hoping that given the characters' last words, we could estimate in which state of mind they were at the time of their deaths:

  • angry
  • sad
  • surprised
  • disgusted
  • scared

In order to do so, I decided to perform what is called "sentiment analysis."

Sentiment What?

Sentiment analysis is a technique which is used to extrapolate the general feeling from a text. Historically, this technique was used so that machines could automatically determine if a sentence was positive, negative, or neutral. If you look up "sentiment analysis" on the web, you'll find lots of examples.

To do so with my dataset, I used the list of words available on this git repo and built a model using the sentences I pulled from the scripts. The idea was to train a model using the sentences I downloaded (of course removing those corresponding to the last words of dead characters) and then applying the model to the last words dataset!

Training a model using text features is super easy (and quick) when using Dataiku DSS. You can also select how to handle the text features with the following:

  • TF/IDF vectorization
  • Tokenize and hash
  • Tokenize, hash, and apply SVD
  • Counts vectorization

Text feature handling
Text Feature Handling in Dataiku DSS

I decided to go with a classic TF/IDF vectorization. This way Dataiku DSS will automatically create the sparse features corresponding to the presence or absence of a word or an n-gram.

Once my random forest was trained (based on the features generated by the TF/IDF vectorization), I applied it to my last sentences dataset to determine the dominant sentiment of these last words! If you want to see how the algorithm classified the dead characters, just have a look at the following web app.

Of course the classification is not significant in all cases: when a character's last words are "Olly..." it is difficult to predict the right label (if it exists).

We want to play with the web app
Stop Talking : We Want to Play with the Web App!

Let's Visualize all of This!

For your (and of course my) pleasure I decided to create a web app within Dataiku DSS to visualize my results and to refresh our memories! I adapted one of the scripts from Mike Bostock so that we can see how characters fell to their deaths, what were his/her last words, and how the algorithm classified them (each node corresponds to a sentiment).

Instructions: when you click on a character, check out the top of the web app to see the character's name, their last words, and click on "death scene" to watch it on youtube! When you click on a node, find out what sentiment it represents. Enjoy :)

Game of thrones death
Click to view their identity
Click to view their last words
Click to view their death scene


If you have any questions about this post don't hesitate to reach me.

Don't forget to download and try out Dataiku DSS's free edition... and build a web app like mine ;-)!

try data science studio


Other Content You May Like