Learning To Rank (LTR)

Previously, in the post Loss Functions in Machine Learning and LTR we disscussed about how loss functions were used in ML and briefly mentioned LTR. Here I’ll discuss about LTR. LTR uses Machine Learning (ML)/Artifical Intelligence (AI) to predict rankings/ordinal data. It’s useful for google search, drug discovery, bioinformatics. Here is a list that seperates traditional ML from LTR:

  • Solve a ranking on a list of items

  • Predict the optimal ordering of the list

  • Doesn’t care much about the score of each item/point

  • only care the relative score/ordering among all the items

For example, if we have 2 ML models to predict students’ score. and our goal is to rank students. and we have below results from the ML models. In this case, Model 2 is better at ranking compared to Model 1 even though Model 1 has better prediction accuracy. Rank error is pair-wise based and is defined as \(\frac{ \# \textrm{ of discordant pairs} }{ \#\textrm{ of total pairs between + and -} }\).

StudentTrue ScoreModel 1Model 2

LTR system includes bipartite ranking, k-partite ranking, real value based ranking. We only talk about bipartite ranking here.

1. Bipartite RankSVM Algorithm

Bipartite RankSVM Algorithm uses hinge loss. The hinge loss is a loss function used for “maximum-margin” classification, most notably for support vector machine (SVM). It’s equivalent to minimize the loss function \(L_{hinge}(f,x_i^+,x_i^-) = [1-(f(x_i^+)-f(x_i^-))]_+ [u_+ = max(u,0)]\)

With \(f = W * X =\) ranking score, the optimization problem is loss + penalty: \[ \min_{f \in F_k} \frac{1}{mn}\sum_{i=1}^{m} \sum_{j=1}^{n}L_{hinge}(f,x_i^+,x_i^-) + \frac{\lambda}{2}||f||_k^2 \]

Thus, the term \(f(x_i^+)-f(x_i^-)\) the larger, the better.If \(f(x_i^+)-f(x_i^-) <0\), it means that it’s making mistakes so the objection function is penalized.

2. Bipartite RankBoost Algorithm

Bipartite RankBoost Algorithm uses the exponential loss.

The population minimizer is: \[\min_{f \in L(F_{base})} \frac{1}{mn}\sum_{i=1}^{m} \sum_{j=1}^{n}L_{exp}(f,x_i^+,x_i^-)\]

where \(L_{exp}(f,x_i^+,x_i^-) = exp(-f(x_i^+)-f(x_i^-))\).

3. Bipartite RankNet Algorithm

Bipartite RankNet Algorithm uses the logistic loss (binomial log-likelihood loss or cross entropy loss).

The binomial log-likelihood loss function is: \[\min_{f \in F_{neural}} \frac{1}{mn}\sum_{i=1}^{m} \sum_{j=1}^{n}L_{logistic}(f,x_i^+,x_i^-)\]

where \(L_{logistic}(f,x_i^+,x_i^-) = log(1+ exp((-f(x_i^+)-f(x_i^-)))\).


Computer Science & Artificial Intelligence Laboratory, MIT

Yuan Du
Yuan Du
Senior Data Scientist

My interests include applied Statistics, Machine Learning, Deep Learning and Healthcare.