Traditional
Information Retrieval started with score-based ranking for search results using
scores such as TF-IDF [1] or
BM25 [2].
As search engines got better, the scores they used started getting better. A
lot of big search engines use a heuristic / score-based model for search
ranking. Most famously, as recently as 2011, Google used a heuristic model for
ranking in search [3] in
spite of having a really strong in-house expertise in Machine Learning (ML).
Ranking
search results using machine learned models has been explored for at least a
couple of decades now. It has gained even more prominence with the popularity
of Learning to Rank [4] techniques
in the last decade or so. For example, Bing has been using Learning to Rank
techniques to rank its search results at least from 2009 [5].
This is
a choice a lot of new and existing search engines have to make: Should they go
for hand-tuned, score-based models or should they use machine learning for
ranking search results.
Here
are some of factors that matter and should go into your decision-making. Note
that most of these points are generic enough to apply to any prediction/ranking
problem and are not restricted strictly to search.
1.
Explainability
For
most ML algorithms, especially for the ones currently in fashion such as
ensembles or neural nets, ranking is essentially a black box in terms of
explainability. You can control the inputs, but it’s really hard to explain
what exact effect specific inputs have on the output. The final model, thus, is
not very explainable.
A
score-based model, especially one where the score is thoughtfully constructed,
is usually easier to reason about and explain.
2.
Implementation time
It
usually takes a non-trivial amount of time to build a new version of a ML
model. You need to run through multiple iterations of the “gather/clean data
-> train -> validate -> test” loop before your model is ready for A/B
testing.
Updating
a score-based model can be as simple as tweaking the scores and thus can be A/B
test ready in a very short time.
3.
Optimization metric
For
most search engines, it’s hard to come up with an objective metric to optimize
for. This is the metric that tells you that your results for a particular search
query are good and the search was successful. This metric changes based on what
product you are building and what constitutes as “success” for your search. You
might be tempted to start off optimizing for user clicks, but if you use clicks
blindly, you may train a model that favors bad, “click-baity” results more. Big
search engines spend a lot of money on building human relevance systems [6],
where trained human raters use well-defined guidelines to generate an objective
“success rating” for each search result. The training data generated by these
systems can then be used to train the ML models to rank search results.
Smaller search engines might not have similar amount of resources as the big
players and might not be able to afford building such systems.
This
optimization metric is important for both ML and score-based systems. However
ML models suffer more if you don’t have a good optimization metric, since you
can end up learning a well-trained model that optimizes for a completely wrong
metric. Score-based systems suffer a little less in comparison given that the
score is constructed using reason and intuition in combination with the metric
you are trying to optimize.
4.
Result relevance
If you
can get your optimization metric right, this is where the ML model can give you
huge dividends. Learning directly from data usually trumps any intuition you
can encode in your score-based model. If relevance matters more to you than any
of the other factors, using an ML model is usually the way to go.
5.
Flexibility
It's
hard to make spot-fixes in an ML model. The best way to fix issues is via
things like using better/more training data or better feature engineering or
hyperparameter tuning, all of which are time consuming.
It's
much easier to fix issues quickly in a score-based model. Given a bug you can
tweak the model easily and have it out for users in no time.
6.
Engineering ramp-up time
If you
are using well-known ML models, it's relatively easy for a good Machine
Learning engineer to ramp up on your system. The model learns from the data,
and while you need some time to understand the overall system, you don't need
to understand all the details of what happens inside the model before you start
making changes to it.
A
score-based model is hand-tuned, and you need to understand all the intuitions
and trade offs baked into the model before you can work with it effectively. For
a fairly complex hand-tuned model, even a good engineer might take months to
have enough context to understand all of intuitions baked-in over years of
working on the model. This problem usually gets worse the older a model gets.
Hybrid
ML/hand-tuned systems
As you
see, ML models can be better for relevance, but have some other shortcomings.
To overcome these shortcomings, most search engines that use a ML model, use a
hybrid ML/hand-tuned system. In this case even though your main ranking model
is a ML trained one, you still have hand-tuned levers such as blacklists,
constraints, or forced rankings to quickly fix egregious mistakes. Note that if
you go this way, it's important that the hand-tuned components remain very
simple and easy to use, or else you might end up having to maintain both a fairly
complex ML model and a fairly complex hand-tuned system.
Advice
for new search engines
For new
search engines, given the various factors above, a good rule-of-thumb would be:
- Start with a hand-tuned model. They are simpler to build up-front and let you hit the ground running.
- Get your initial users. Get them to use your search for a while, so that you generate good training data for your future ML model.
- When you reach a scale where incremental gains in relevance are more important than the rest of the factors, consider moving to an ML model. But do make sure you have a good answer to the “Optimization Metric” problem before you start working on a ML model.
Endnotes:
[4] Nikhil
Dandekar's answer to What is the intuitive explanation of Learning to Rank and
algorithms like RankNet, LambdaRank and LambdaMART?
Source: Quora