The Super Bowl is upon us again, and that means excitement about, in order of importance: advertisements, beer, Justin Timberlake, snacks as a meal, and football. Since ads have already been leaked, that leaves beer in the top spot, and you better believe we’ve got some relevant data science + adult beverage action for you.
New England Patriots vs. Philadelphia Eagles? Pffffft. There’s already plenty of data and models working hard on predicting that outcome, so we won’t go there. Instead, there’s a much more important question that, unless you’re an east-coaster, is much more likely to impact your overall happiness and satisfaction this Sunday.
And that is: what awesome brew should I buy to get us through the game?
Well, you’re in luck, my friends. Pierre Gutierrez has built a beer recommendation engine, and it’s the perfect project for your Super Bowl Sunday.
How Does It Work?
Well first of all, you can get the nitty gritty tech deets here if you’re interested in a deep dive. But in a nutshell:
- We scraped a beer rating website and ended up with several datasets with a total of 4,563,152 ratings; 91,645 distinct users; and 78,518 beers.
- The most important dataset contains the reviews (scores given by the users) and is composed of:
- “beer_id”: a unique id for a beer
- “user_id”: a unique id for a user
- “score”: the review score, between 1 and 5
- “date”: the timestamp of when the user posted the review
- “review”: a review of the beer (this is optional and left empty 75 % of the time)
- We also scraped information about each beer (the brewery with which they are associated as well as user data). With this the analysis is extended by adding some “beer geek knowledge.”
- From there, we build an explicit recommendation engine, which basically boils down to a regression problem (trying to predict the ratings for each user). This means that Pierre tries to recommend a beer to a user that (s)he is likely to rate highly upon drinking.
Data + Beer = MORE BEER.
Once again, feel free to check out the project (parts one and two have been released, so there’s still more on the way, including using even more metadata to explore possible model architectures). That in combination with our recommendation engine guidebook should be plenty to get started on one of your own before the big game.
Even More Beer!
If you still want more after checking out data from the trenches (or if you want a higher-level overview of building a beer recommendation engine in video format), you’re in luck. We love beer so much here at Dataiku that Data Scientist Guilherme de Oliveira also did a similar project for his trip to Portland, and you can watch his talk on how he determined the best brews to check out in the City of Roses here: