Author: Harrie Oosterhuis.
Recommendation has become an important part of our digital lives. When watching a video, doing online shopping, listening to music, or even reading a news article, we are being recommended things to look at next. Similarly, we expect easy-to-use search features everywhere to make our way through online content without effort.
Behind every search or recommendation feature there is a ranking system that decides what items to present and in which order. The quality of such a system has a large impact on the user experience: it is wonderful when you get that perfect music recommendation, or the document you are looking for shows up instantly. For companies it can also mean increased revenue: users stay longer on your website, or find products they want to buy more easily.
Thus it is no surprise that optimizing ranking systems is very important and active area of research. We have recently introduced a novel approach that learns user preferences from their interactions and updates ranking systems accordingly.
Learning from User Interactions
In order to provide users with rankings that they prefer, we first need to learn what their preferences are. Unfortunately, users rarely directly tell you what they want. But luckily, they behave in ways that indicate their preferences, allowing us to learn preferences from interactions. Interpreting human behavior is hard — for people and for machines.
Besides interaction noise, there are also biases that affect user behavior. For instance, selection bias is unavoidable because users can only interact with the presented items and never with items that are not displayed. Position bias is very prevalent and occurs because people spend more attention to the top of rankings. The figure below shows the effect of position bias on a result page. We see that all users look at the top item, but less than half look at the eighth position. This means that an item’s display position has an enormous effect on the interactions we observe. Therefore, we should account for the effect of these biases to accurately learn preferences from user interactions.
Interactively Optimizing Ranking Systems
We have recently introduced a novel algorithm that follows the online learning to rank approach. This means that the algorithm learns by directly interacting with users. The algorithm repeatedly chooses a ranking to display to the user, observes the user’s interactions, and learns from the observed interactions. It tries to simultaneously provide the best user experience and improve its ranking system at every step. This results in a continuous loop where the system adapts to user preferences automatically and indefinitely.
The novel algorithm that we have introduced is called Pairwise Differentiable Gradient Descent (PDGD). Unlike previous online learning to rank algorithms, PDGD uses a probabilistic ranking model, meaning that it maintains an explicit confidence score over different rankings. For every ranking it can say how probable it thinks that this is the best possible ranking, much akin to a policy in reinforcement learning.
Learning from interactions is done by inferring pairwise document preferences, a very old concept. In a nutshell, the underlying assumption is that if a user observes several documents but only clicks on a few, then they prefer the clicked documents over the unclicked. This inference is not always correct, but averaged over many interactions it provides reliable preferences. PDGD uses this pairwise approach but adapts it to account for position bias: for each pair of documents PDGD looks at how likely they were to be displayed in swapped positions.
Thus, a document displayed at a higher position may have had an advantage, but PDGD also looks at how likely that same advantage would have been given to the other document. By correcting for these (dis)advantages, an unbiased update can be made to the ranking system, and the true user preferences can be learned.
Improvements by the Novel Approach
To evaluate the performance of PDGD, we tested it in various simulations with varying degrees of interaction noise and position bias. This allows us to see how much the algorithm is affected by these factors. The simulations were based on three commercial datasets from the Bing, Yahoo and Istella search engines. Across all datasets and all levels of noise and bias, we see large improvements of PDGD over previous methods. The graph below displays results under ideal circumstances simulated on the MSLR-Web10k dataset. The dashed line shows the performance of LambdaMart which was given the true user preferences. Of course this is a hypothetical upper bound since in a realistic scenario the true preferences are not known. Dueling Bandit Gradient Descent (DBGD) is the most prominent previously existing method. However, it is unable to reach upper bound performance even under these ideal circumstances.
In contrast, PDGD performs much better at every time-step by a very large margin., Importantly, when PDGD optimizes a neural ranking model it improves even further and actually reaches the upper bound. This is the first time that we see an online learning to rank method solve a ranking problem perfectly! In other results we verify that PDGD has improved performance when noise and bias are present. Moreover, in an upcoming publication we generalize these results to conditions with extreme levels of noise and bias. Across all experimental conditions, PDGD performs better than previous methods by a very large margin.
Why does all of this matter?
With the introduction of the PDGD algorithm, ranking systems can now be optimized from user interactions far more effectively than previously possible. Additionally, PDGD can also optimize neural models to a greater effect, something previous methods couldn’t do. We expect that the development of ranking systems will benefit from this contribution on the long term. Not only because of improved performance, but also because the possibility to optimize more complex models opens the door to many different possibilities.
Want to know more?
The PDGD algorithm has been introduced in Differentiable Unbiased Online Learning to Rank, a paper presented at the 27th ACM International Conference on Information and Knowledge Management (CIKM 2018). Generalizations of the results in that paper are included in Optimizing Ranking Models in an Online Setting, which will be presented at the 41st European Conference on Information Retrieval (ECIR 2019). An implementation of the algorithms and experiments is also available on GitHub.
Harrie Oosterhuis is a PhD student within ILPS. His research is focused on online learning to rank.