Under the Hood – The Tech Powering Search Direct

Today we announced Search Direct. As an engineering team, we’re always looking for ways to make search faster, more relevant, and easier to use. Today’s news hits all of those beats, but we also think it does something more. We think it makes the search experience more innovative and fun by bringing rich, related content right into the search box as you type.

Screenshot of Local Weather via Search Directweather_med

How did we do it? The story of Search Direct starts with another technology that Yahoo! pioneered, Search Assist. Search Assist focuses on helping people submit better search queries by suggesting the likeliest queries for what the user is typing. With Search Direct, the idea is that if we understand what you’re looking for, then why not skip past the query itself and go straight to giving you the answers you’re looking for right there in the search box as you type? It seems obvious, but in reality this is the first real fundamental change to search in more than a decade.

This kind of experience requires serious technical infrastructure. Not only are we leveraging contextual cues and historical query logs to algorithmically divine your intent in real time, we’re also assembling the results simultaneously by federating to Yahoo!’s massive content resources, identifying trending topics, and integrating social mining to identify the most popular content relevant to the query. Then, to top it all off, after we’ve extracted, ranked and scored all this information, we present it in a rich experience at blazing fast speeds.

Given the uniqueness of the engineering challenges of Search Direct, we thought a little background would benefit the developer community. One element you’ll notice is that we built the solution from the ground up to be a self-contained service that can be integrated into all of Yahoo!’s many sites and services around the world. So, without further ado, here’s a peek under the hood of Search Direct.

The Back End:

With Search Direct, we primarily interact with two back end systems. One is called “Gossip,” which generates the likeliest query matches, and the other is called “NRTI,” which federates content from Yahoo!’s rich content repositories.

As a user types in the search box, search suggestions are shown (from Gossip), while we simultaneously generate and display rich content associated with that first query suggestion (from NRTI).

The handling of caching in NRTI is particularly interesting because we use different approaches for different types of content. For example, with content that doesn’t change constantly (such as overview information on a sports team) we save the information in NRTI, which in turn uses memcached to maintain the key query-to-data association that we use to respond to a user’s search as they type. With content that’s highly dynamic in nature (like financial information or movie showtimes), NRTI interacts with each of these databases independently and on an ongoing basis, collecting and caching them using STCacheServer powered by MDBM.

The Front End:

When you’re dealing with real-time responses to a user typing, performance is paramount. Not only do back-end systems like Gossip and NRTI have to be architected for speed, but the presentation layer and client-side programming have to be optimized for rendering the data at incredibly high speeds. For Search Direct, we used a client-side JavaScript framework that renders information in any given trip (the journey from a user typing, to the back end databases, and back) ultra fast with a number of interesting characteristics:

1. Plug and Play: The framework for Search Direct is a widget-like implementation, fully sandboxed, and library-agnostic (it’s based on YUI3). This allows for the framework to be dropped into any product experience on Yahoo! (and potentially beyond) by adding a search box to the page and making a couple of changes to configuration attributes. The host page (for example, the Yahoo! Home page) could have a different infrastructure, a different platform, even a different version of the YUI library, and it doesn’t matter – the entire experience will look, feel and react to users in the same way.

2. Scaling for Rich Content: Our framework is also designed to scale in terms of rich content. Our approach is to create modules that encapsulate rich content (a module is the HTML content for a rich panel of content, like videos or images, related to a particular query). There are a number of cases where we want to have different modules for the same query to account for differences between international markets, languages, and so on. Considering that a module implementation is fairly simple (based on a query, an appropriate HTML fragment needs to be produced), we wanted to build an architecture that could scale to collect and serve modules from a number of different platforms and sources in the future. Therefore, we put in place an architecture for Search Direct that’s based on the RMP protocol where the Search box works as one instance of a Remote Module Publisher Service (providing the rendering process through a new Yahoo! RMP protocol), but that also is flexible enough to support more publisher services in the future. Looking ahead, we will be able to reuse any module published using the RMP architecture and using YQL to query across different providers.

3. JavaScript: We have two primary JS components powering Search Direct, the injection engine and the bootstrap engine. The injection engine is a tiny piece of JavaScript owned by the individual sites using the Search Direct framework. The injection engine enables different sites to do customizations of the user experience. For example, customizing the visual look and feel of the shell around the search box.

The bootstrap engine is controlled by the Search Direct platform and is shared. It gives us the flexibility to push new versions of the Search Direct Framework, reducing support time, maximizing shared code and keeping our deployment cycles independent.

Finally, the dual injection engine/bootstrap engine architecture also directly improves performance: when a user lands on any page running Search Direct, the entire framework is cached by the browser, and any triggered Search Direct experience will be lightning fast.

4. Dynamic iFrame: Search Direct runs sandboxed into a dynamic iFrame (similar to something like Meebo). This allows us to isolate all the styles and the code, guaranteeing that the host page’s performance or configuration won’t affect the way that Search Direct looks or performs and vice-versa. Over the long term, as we add new features and capabilities to the Search Direct experience, we can be sure that the pages using the technology won’t have to do anything to accommodate upgrades.

We hope this gives you an idea of the kind of technical challenges Search Direct posed to our team. Definitely try it out (link). We think it makes the experience of Search a lot more fun and engaging. From an architectural point of view, both on the back end and the front end, we took a true modular approach throughout development. In the coming months, Search Direct will be rolling out to Yahoo!’s hundreds of millions of users across a variety of products. As it does, we expect to get a ton of feedback and data to help us improve. We’ll be sure to check back in here on YDN to share what we learn.

- By Ethan Batraski (Lead Product Manager) Shenhong Zhu, Hang Su, Huming Wu, Caridy Patino, Dolly Do, and Sudharsan Vasudevan, Kartik Ramakrishnan and the entire Search Direct Team