Learning To Rank (LTR)

Last updated on Jul 31, 2022 2 min read Statistics, Machine Learning, Data Science, Deep Learning

Previously, in the post Loss Functions in Machine Learning and LTR we disscussed about how loss functions were used in ML and briefly mentioned LTR. Here I’ll discuss about LTR. LTR uses Machine Learning (ML)/Artifical Intelligence (AI) to predict rankings/ordinal data. It’s useful for google search, drug discovery, bioinformatics. Here is a list that seperates traditional ML from LTR:

Solve a ranking on a list of items
Predict the optimal ordering of the list
Doesn’t care much about the score of each item/point
only care the relative score/ordering among all the items

For example, if we have 2 ML models to predict students’ score. and our goal is to rank students. and we have below results from the ML models. In this case, Model 2 is better at ranking compared to Model 1 even though Model 1 has better prediction accuracy. Rank error is pair-wise based and is defined as $\frac{# of discordant pairs}{# of total pairs between + and -}$ .

Student	True Score	Model 1	Model 2
Student1	90%	88%	100%
Student2	85%	89%	50%
Student3	80%	83%	10%

LTR system includes bipartite ranking, k-partite ranking, real value based ranking. We only talk about bipartite ranking here.

1. Bipartite RankSVM Algorithm

Bipartite RankSVM Algorithm uses hinge loss. The hinge loss is a loss function used for “maximum-margin” classification, most notably for support vector machine (SVM). It’s equivalent to minimize the loss function $L_{h i n g e} (f, x_{i}^{+}, x_{i}^{-}) = [1 - (f (x_{i}^{+}) - f (x_{i}^{-}))]_{+} [u_{+} = m a x (u, 0)]$

With $f = W * X =$ ranking score, the optimization problem is loss + penalty: $min_{f \in F_{k}} \frac{1}{m n} \sum_{i = 1}^{m} \sum_{j = 1}^{n} L_{h i n g e} (f, x_{i}^{+}, x_{i}^{-}) + \frac{λ}{2} | | f | |_{k}^{2}$

Thus, the term $f (x_{i}^{+}) - f (x_{i}^{-})$ the larger, the better.If $f (x_{i}^{+}) - f (x_{i}^{-}) < 0$ , it means that it’s making mistakes so the objection function is penalized.

2. Bipartite RankBoost Algorithm

Bipartite RankBoost Algorithm uses the exponential loss.

The population minimizer is: $min_{f \in L (F_{b a s e})} \frac{1}{m n} \sum_{i = 1}^{m} \sum_{j = 1}^{n} L_{e x p} (f, x_{i}^{+}, x_{i}^{-})$

where $L_{e x p} (f, x_{i}^{+}, x_{i}^{-}) = e x p (- f (x_{i}^{+}) - f (x_{i}^{-}))$ .

3. Bipartite RankNet Algorithm

Bipartite RankNet Algorithm uses the logistic loss (binomial log-likelihood loss or cross entropy loss).

The binomial log-likelihood loss function is: $min_{f \in F_{n e u r a l}} \frac{1}{m n} \sum_{i = 1}^{m} \sum_{j = 1}^{n} L_{l o g i s t i c} (f, x_{i}^{+}, x_{i}^{-})$

where $L_{l o g i s t i c} (f, x_{i}^{+}, x_{i}^{-}) = l o g (1 + e x p ((- f (x_{i}^{+}) - f (x_{i}^{-})))$ .

Reference:

Computer Science & Artificial Intelligence Laboratory, MIT

Machine Learning Data Science Decision Theory LTR Deep Learning

Learning To Rank (LTR)

Yuan Du

Senior Data Scientist