Automatically generating user recommendations using a hybrid recommendation system.
In our current digital era, we are constantly bombarded by a myriad of choices. At what restaurant are we going to eat tonight? What type of mattress will give us the best night’s sleep? What movie are we going to watch? Sometimes, these choices can be a bit overwhelming. That’s why a lot of websites and other online services use recommender systems to make these choices easier for their users. Some of the most famous examples of these types of recommender systems include Amazon’s product recommendations, the Discover Weekly playlist automatically generated by Spotify, or the movies and series Netflix recommends their users.
These recommender systems usually use information about the users and the recommended items to score items by their relevance. The biggest choice to make during the design of such a system is what kind of information you want to use to generate the recommendations. Some systems use information about the behavior of similar users, while others look for the similarity between the items to be recommended. It is also possible to use a mix between these two methods and create a hybrid recommender.
At Goldmund, Wyldebeast and Wunderliebe, we recently released holland-now.nl, an event website that automatically gathers events from all over the Netherlands. For this website, we created a recommender system that recommends relevant events to users. We implemented a hybrid recommender system using LightFM, a Python package that contains some popular recommendation algorithms. While this system works well if we have enough information for a user, we also needed a way to recommend items to brand new users or one-off site visitors. That’s why we implemented our own system that recommends items mostly based on their overall popularity.
The main question that needs to be answered during the design phase of a recommender system is: “how do we make sure that we recommend our users those items that are most relevant to them?”.
To answer this question, a couple of other issues have to be tackled.
First of all, recommender systems very often run into what is commonly known as the “cold start problem”. They might work well when there is lots of user data available, and there are lots of examples they can be trained on, but they severely underperform if user data is scarce. Since the recommender systems of holland-now.nl were already in place during the launch of the website, we needed to think of a way to overcome this problem.
Another important design choice that we already mentioned before is what information will be used to generate the recommendation scores. It is possible to base the recommendations entirely on information about the items you are recommending. While this helps deal with the cold start problem a little bit, it completely neglects all info about the behavior of other users. Usually the quality of these so-called content-based recommenders is lower than the quality of other similar methods.
Other recommenders use information about the behavior of users, while ignoring the contents of the recommended items. These collaborative recommenders try to find similar users, and base recommendation scores on their interaction with the items to be recommended. In practice, this often means that collaborative recommender systems will recommend items that similar users have liked, bought or clicked on. The issues with these kind of systems are similar to those of content-based recommenders. They neglect a lot of (useful) info, and their quality can be lacking because of it. Furthermore, they suffer heavily from the cold start problem. Basing recommendations on the behavior of other users only really works if there is a lot of user data, and initially, this is often missing.
The most common solution to these issues is to use a hybrid recommender system. Hybrid systems are like a “best of both worlds” solution, since they use information about user behavior, as well as item metadata. Obviously, both of these need to be available. If a website or system stores no information about their users or their items, content-based or collaborative recommenders, respectively, are the only good option.
There are different approaches to the design of a hybrid recommender. One possibility is to generate content-based and collaborative-based predictions separately and then combine them. You can also add content-based capabilities to a collaborative-based system (and vice versa), or you can even combine the approaches into one unifying model.
For holland-now.nl we used LightFM to implement a hybrid recommendation system in Python. As per their own documentation, “LightFM is a Python implementation of a number of popular recommendation algorithms for both implicit and explicit feedback.” It makes it very easy to create and test different kinds of recommender systems, with easily tweakable parameters. It also represents users and items as the sum of the latent representations of their feature, which means it looks at the underlying representations that the model uses to find their similarity. This allows the system to generalize to new items and to new users.
While LightFM generalizes well to new users, it is impossible to create recommendations for users that we have no information about at all. That’s why, for brand new users and unidentified site visitors, we recommend events based on their popularity and the proximity of the starting dates of events to the current date. Three types of user information are used for this process. We record user favorites (it is possible for a user to mark an item as favorite), user visits (if a user has visited the detail page for an event) and user click throughs (if a user has followed the link to the ticket link for an event). Each of these are weighted based on how much we think they indicate that a user likes the event: 0.5 for visits, 0.75 for click throughs and 1 for favorites.
It is also possible for users to indicate their preferred event categories, like music, movies or sports. Events in these categories get a small boost to their preference score. We also give a small negative boost to events based on the amount of days before the event start. This means that events that are happening sooner get a higher recommendation score then events that are happening later. Summarized in a formula, the recommendation score is calculated as follows:
Recommendations for users we have more information about are generated with our LightFM model. We still use the same favorites, visits and click throughs, but now we also look at the actual content of the events. Two matrices are generated. In the first one, users are mapped out against events. All user/event combinations get a base score of 0. If a user has favorited a certain event, we add 2 to that score. For click throughs and visits, we add 1.5 and 1, respectively. Each user/event pair ends up with a score between 0 and 3.5.
The other matrix maps events against keywords. If a certain event contains a certain keyword, the score for their column and row is 1. Otherwise, it is 0.
We train a LightFM model using the user/event matrix as training data, and the event/keyword matrix as item features. Logistic loss is used, which makes it so the model outputs are easily scalable from zero to five. This is necessary because we display stars to indicate how relevant an event is for a given user.
We use trained model to predict scores for each event for a certain user. The scores for events that are in preferences for the user are multiplied by a small preference boost. We then scale the recommendation scores between zero and five, and rank them in order of relevance.
Even though it is possible to train the model with relatively little user data, validation requires a lot to actually give meaningful output. By comparing actual user behavior to recommendation scores, you can generate some error metrics like the root mean squared error or the mean absolute error. In essence, you are trying to find out if events that the algorithm recommended were actually interacted with positively by the user. These metrics get more reliable if there is more actual user data to compare the recommendations to.
Another way to validate how well a recommender system is performing is by getting actual user feedback. You could let users rate their recommendations on their relevancy, and on how much they actually liked the recommended items. Unfortunately, you also need a lot of user data to gather enough user feedback.
Because holland-now.nl has only just been released, we currently lack the necessary user data to get any sensible validation metrics. Based on manual inspection, the results do look very promising.