The success of Amazon and Netflix has made recommendation systems not only common but also extremely popular. For many people, the recommendation system seems to be one of the easiest applications to understand; and a majority of us use them daily.
Haven't you ever marveled at the ingenuity of a website offering the HDMI cable that goes with a television? Never been tempted by the latest trendy book about vampires? Been irritated by suggestions for diapers or baby powder though your child has been potty-trained for 3 months? Been annoyed to see flat screen TVs pop up on your browser every year with the approach of summer? The answer is, at least to me: "Yes, I have."
But before cursing, every user should be aware of the difficulty of building an effective recommendation system! Below are some elements on how these systems are built (and ideas for how you can build your own).
To capture the mysterious mechanisms of human psychology, several strategies are possible.
- The association strategy. "Those who looked at...also looked at". This is a matter of looking at purchasing sequences, or purchasing groups, and showing similar products.
We can, in fact, either work directly on the content matrix (remote similarity) or look at the purchase sequences and work on algorithms specific to rules of inference, such as Apriori and FP-Growth. This strategy is useful for capturing recommendations related to naturally complementary products, as well as at a certain point in the life of the user (e.g., after a purchase).
- The latent factor modeling strategy. This involves inferring an individual's inherent interest for a product, by imagining implicitly that these previous choices are related to the overlapping of certain tastes or hobbies. The logic is to use the history of user interactions to "learn" these tastes. In terms of algorithms, this learning uses matrix diagonalization techniques.
The famous "Netflix" challenge is an instance of a latent factor modeling problem. In the particular case of the Netflix challenge, it was a matter of trying to learn through the rating a user gives to a film. Sometimes a rating is not available, thus a sub-task is to calculate a synthetic interest score for a product by combining direct and indirect signs (pages viewed, time spent, purchasing, etc. ...).
The topic modeling strategy. This is a variant of the previous strategy, in which we consider each user interaction with the products at the most granular level, reading a text, viewing an image, etc. This modeling is particularly interesting because it allows us to take into account these users' search terms. The approach will use specific algorithms, particularly to infer subjects derived from this corpus. This approach is particularly interesting for users on sites where the content has rich but unstructured text information (e.g., news articles).
The user similarity strategy. This involves comparing user purchase histories, either at the item level, or in terms of the characteristics of these products (brands, prices). The most common is to use clustering algorithms, with an ad-hoc distance for creating user groups. The recommendations provided are then those matching the most popular items for members of the group. This strategy is useful on sites with a strong but versatile audience, to quickly provide recommendations to a user on which little information is available.
The content similarity strategy. Like the previous strategy, this one involves determining contents that are close - from their meta data - for the recommendation. This approach makes sense for catalogs with a lot of rich metadata, and where traffic is low compared to the number of products in the catalog.
The popular content promotion strategy. This involves highlighting product recommendations, based on the product's intrinsic features that may make it interesting: price, feature, popularity, etc. This strategy can also take into account the freshness or age of the content, and thus enable using the most "trendy" contents for recommendations. This is an interesting approach on catalogs where new contents are the majority.
To win, you have to adapt!
Obviously, one strategy is not always better than another! It is often a question of using a combination of techniques depending on the location and context. Thus, one strategy can be used on the main page, and another on the user cart.
Another approach is to maintain multiple algorithms in parallel, and then combining them. The combination itself (which algorithm is preferred over another) may be itself subjected to machine learning, using techniques such as multi-armed bandit.
Ultimately, by combining these techniques, we can build a sophisticated algorithm which, for example, will learn to follow a similar content strategy for new users of the site, uses latent factors for the others, leaving a share of voice to promoting popular content...
To build a powerful recommendation engine, an appropriate strategy should be implemented. The limiting factor will often be having suitable data (events, catalog, etc.). However, most of the necessary algorithms are known and mastered. If you want to implement a recommendation engine adapted to your website or mobile application, or more information about this, please contact us!