What is Hybrid Search? Maybe Not What You Think

What is Hybrid Search? Maybe Not What You Think

What is Hybrid Search

Hybrid Search Might Not Mean What You Think

Sometimes, even Google can’t keep up. Ask the search engine “What is Hybrid Search?” and the top answer from Google is a Featured Snippet from Microsoft Documentation:

“Hybrid search lets your users search for files and documents across SharePoint Server and Microsoft 365 at the same time.”

Bing doesn’t fare much better with the same query:

“Hybrid search is the use of an on-prem search head to look at data stored in Splunk Cloud.”

These answers don’t even begin to get to the root of what “Hybrid Search” means in the context of search and information retrieval applications.

The Next Logical Evolution of Search

In our opinion, Hybrid Search is what happens when a truce is declared in the battle between traditional keyword search technologies, and the new generation of search engines that espouse a radical adoption of AI to develop “cognitive search” or “insight engines.”

The reality is that the best current search experiences are achieved when you combine traditional results from a search index with answers from a knowledge graph and answers extracted from AI technologies that come closer to “understanding” the content in documents.

A Deep Technical Dive

There are some good resources at the end of the blog for an overview of what we think is the correct definition of Hybrid Search, but for a deeper technical dive, we recommend this excellent presentation by Lester Solbakken at Berlin Buzzwords 2022.

In the video, Solbakken, a pioneer at Yahoo!, discusses Okapi BM25, the classical algorithm used for query results ranking by traditional keyword based search engines.  He mentions their simplicity and effectiveness, but vulnerability to vocabulary mismatch.  Think of the simple example where “bite the bullet” has a literal as well as an idiomatic meaning.

Pre-trained large language models such as Google BERT and GPT-3 have emerged which use deep learning to develop a more nuanced “understanding” of language and content.  But developing and customizing these models can be computationally expensive, and the availability of datasets to fine-tune these models can be problematic.

Solbakken’s analysis suggests that lexical (keyword) and deep learning models can complement each other, retrieving different sets of relevant results.  He also takes a closer look at different strategies to combine results to get the best of both worlds, even in cases where deep models have not been fine tuned.

If you’re not convinced about the power of language models to enhance traditional search, we leave you with the results when we asked GPT-3 “What is Hybrid Search?”

“Hybrid search is a method of searching that combines multiple search methods, such as keyword search, natural language search, and semantic search, to find more relevant results. This type of search is used in search engines and other applications to provide better and more accurate results.”

Pretty good huh?  Stay tuned for more on all this in our upcoming Blog Series on ChatGPT, GPT-3, and Large Language Models.

If your head is spinning from the depth that the video goes into, wait until you dive into dense vector search and other new AI-related search techniques.  The general resources below may be helpful, and, as always, feel free to CONTACT US for more information or a free consultation.

Helpful Hybrid Search Resources

Stay up to date with our latest insights!