The recommendation system topic in machine learning has been extensively documented; nowadays, you can find information ranging from the very basic to the cutting-edge (we’ve written our fair share about the topic too - including not one, not two, but three articles on beer recommendation engines alone).
When one of our customers is considering building a recommendation engine, my first thought is not whether they’ll have a cold-start problem or which particular algorithm or technique they intend to use. Instead, I have to think about how to enable the project to see the light of day.
During the prototyping stage, the focus is data cleansing and exploration, feature engineering and enrichment, getting good performing models, etc. But taking all this to production for an evolving dataset with tuning and retraining, and most importantly, actual users - well, that is hard.
Hard, But Not Impossible
We decided to challenge ourselves to build a recommendation system and put it in production in one business day. Challenge accepted!
We decided to use an existing dataset of cooking recipes, which include detailed descriptions, ingredients, user reviews, and ratings. The idea was to build a recommendation system that would suggest recipes one could cook with the ingredients in the fridge.
The day started with a planning session. I find it important to have (at least) a skeleton plan and initial goals. If time is tight, as it was in this case, having people overlapping is not the best use of resources. Armed with coffee, cookies, and a Spotify playlist, we started drafting on the board what we were going to build:
The idea was to use the description, comments, reviews, and ingredients for each recipe in the dataset to build the recommendation engine. We would gather some keywords from the user (through a web app) and retrieve the most relevant recipes containing those ingredients. In the back-end, we combined and normalized the text columns of interest, built a vocabulary of n-grams using a couple of the scikit-learn vectorisers, then add TFIDF based features to our data.
This was done as a combination of visual and code-based recipes in Dataiku, and in the end, the flow looked fairly concise (we were short on time, after all!).
This, however, was only tackling the problem from the data to the user. Now we needed to build something for the user to interact with, select the relevant ingredients, submit them, and get feedback. To do this, we built a web-app in Dataiku. A simple one would do just fine, but it had to look the part (nothing too big for Boostrap)!
There are several types of web apps in Dataiku (Bokeh, Shiny, pure JS or full Python/JS/HTML/CSS), but we went for the Python backend option, which is a good middle-ground between carrying out data operations in the web-app vs. in the project itself.
Note: When developing web applications, one normally offloads most of the computation outside the application so that the user gets fast responses. This usually means carrying out many aggregations and pre-processing jobs in the database or asynchronously.
The flow of the app is fairly simple:
- A user enters ingredients of interest (we added autocomplete for a nicer experience).
- The backend parses the ingredients into the relevant n-grams and submits a query to the backend to retrieve the most relevant recipes.
- The app organizes the results in a carousel in the result section along with options to retrieve a full list of ingredients and steps.
But to make the user journey as described above, we had to use one more Dataiku component: the API Designer, which exposes endpoint(s) from a project given a pre-defined logic. For more information on the API Designer, see here.
In order to complete the project, we added automation steps to keep the data fresh and relevant. It’s 5:00 p.m., and our hackathon day just finished. I’m tired, but happy we made it. I’ll spend a couple of hours in the next few days polishing the CSS in the app, but besides that - challenge completed! We built a basic recommendation engine in just one day.