Vertical search engines are emerging every day, as people look for information that was otherwise hidden in the enormous indexes of Yahoo!, Google, Microsoft, and other generic search engines. You can find search engines devoted to food, cars, people, environmentalism, and much more. Many of today's vertical search engines are pushing beyond the repackaging of existing search results and creating enhanced, better results by focusing on their core audience.
The Past, Present, and Future of Building a Search Engine
Search engine construction has evolved from the early days of building from scratch to today's plethora of data APIs that make tomorrow's vertical search engines more powerful and easier to build.
- Huge expenses to build the index, find the data, maintain the process.
- Majority of time spent on building relevancy and less on design and creating a unique experience.
- Search APIs reduce the complexity of building an index.
- Vertical search engines still spend significant resources on creating unique data.
- More resources are spent on designing the best relevancy and a unique experience.
- New search engines tap into huge amounts of distributed data.
- More time for developing unique approaches to presenting relevant information and creating a unique experience.
Vertical search engines have a distinct advantage over the general search engines. They already know what their users are interested in. A search for Jaguar in Yahoo! may return the automobile, the Mac OS, or the animal. However, vertical search engines that specialize in sports, autos, or animals would not have that problem. This assumption of user interest gives vertical search engines more flexibility in creating new models of relevancy ranking.
Let's look at some emerging trends among recent Yahoo! BOSS-powered sites and applications.
Yahoo! Fire Eagle is a location standardization and distribution platform that allows developers to use location-based services very easily. Location-based search is one of the most promising areas of future sites. This is especially true as mobile phones make it easier to determine location and display relevant information.
It's great to know the restaurant, shops, and friends that are around me right now. But it is even more interesting to know what I can find in the next block, mile, or town. This topic was discussed in the www2009 conference paper: Mining Interesting Locations and Travel Sequences from GPS Trajectories for Mobile Users by Yu Zheng, Lizhu Zhang, Xing Xie and Wei-Ying Ma.
FirePin, an iPhone application that lets you generate a sharable map, uses a combination of Fire Eagle and Google Maps to plot your route in real-time. This could easily be connected to a search engine that computes the probable next location and returns local businesses, census data, historical information, and availability of friends.
With the vast amount of data avialable on the net there is no reason to limit your site to just a search API. Many sites are using the user's query to trigger a series of APIs for related information about the subject. A search for Tiger Woods on a sports-related site could build modules based on data from Wikipedia, video from YouTube, latest tournament results, and even build a map of championship golf courses.
Some sites are also using secondary sources to enhance relevance of their search results. This topic was discussed in the www2009 conference paper: Understanding User's Query Intent with Wikipedia by Jian Hu, Gang Wang, Fred Lochovsky and Zheng Chen. DuckDuckGo is a new search engine that is combining these ideas. They use Wikipedia to help enhance relevance, as well as using multiple data sources to provide a more rounded experience.
The amount and variety of data available is rather surprising. The OpenData movement has made data sharing and discovery much more transparent and efficient. For more information on OpenData visit: DataMob.org, TheInfo.org, InfoChimps.org.
Internal and External Data Sources
Yahoo! BOSS offers a custom search experience for larger partners, such as TechCrunch. This allows BOSS to index proprietary data that is normally not available to search spiders. This data can be combined with whitelisted sources and feeds to create a unique set of expert sources. The custom approach also allows for structured search options, such as displaying only articles published within a certain time frame, by a particular author, and about a specific topic.
Even without the BOSS custom approach, vertical search engines can develop their own unique data sets, whether it is the index of books in a library, the statistics generated by research, or other unique data for a subject.
For example, we could build an art search engine. A user searches for "Mona Lisa" and the Louvre's web site is returned as the first result. This could be combined with internal data to display additional information about the painting, Leonardo Da Vinci, the Rennaissance, or the Louvre Museum. Perhaps the site adds a list of related artists: Raphael, Michelangelo to the result for further exploration.
Coloralo is an interesting search engine that uses offline analysis to produce specific results. The site was a product of neccessity as the engineer wanted to find new images for his children to color. Coloralo is an image search engine that only returns black and white line drawings for kids to draw on.
When a user searches for "horse" the site requests many images, caches, and analyzes them for the number of colors and distribution of blacks and whites. This analysis returns a smaller list of images that are appropriate.
Truevert is an environmental vertical search engine that is going beyond the basic assumption of a niche user's intentions. They build a unique natural language dictionary to enhance relevancy. A search for "CFL" on a regular search engine could return "Canadian Football League" but Truevert recognizes this as the acronym for "Compact Flourescent Lighting", a much more relevant term for environmental concerns.
Beyond Search as a Site or Function
Vik Singh, the architect of Yahoo! BOSS, described the Yahoo! BOSS API not as a search API but as a data API during www2009's Web Search APIs: The Next Generation panel. Singh suggests search is the best way to work with the wealth of data on the internet. This data doesn't have to be used to ceate a set of results on a search page.
Search as a Function
Chris Heilmann created Keyword Finder when BOSS began displaying the keyterms associated with a result. These keyterms are the words that have been associated with a web site inside the Yahoo! Search Index. Keyword Finder looks at the top results for a term and returns the keyterms that are the most effective for that term. This helps a site user plan their Search Engine Optimization strategy.
Another site that replaces a list of results with a single answer is Bossy. This site analyzes the results to determine a consensus to decide what is correct. An example of Bossy would be: Q. Where is the Prado?. A. Madrid.
Beyond the Search Engine Web Site
Future search projects will also go beyond the basic browser. It's time to think about this data and new applications. Let's look at what we can do.
Search on the desktop
Xobni is a great search application for the desktop computer. Xobni extends Microsoft Outlook, providing a much stronger search functionality as well as tying into social networks, such as LinkedIn.
Search as a tool
Zemanta is a search-based tool that discovers related content for people who write blog posts. Zemanta is a FireFox plugin that analyzes the context of what you are writing and searches for similar images, articles, and even products on Amazon.
Inquisitor is another tool that has taken search into the browser. Inquisitor replaces the browser's search interface with much more powerful and faster search-suggestion generator.
Search as a module
The OpenSocial standard allows developers to build a single web application and have it appear on multiple social networks at the same time. For example, you build an application which finds daily statistics, gossip, and news about the players in a user's fantasy football league. This single set of code could be simultaneously used in Facebook, Yahoo!, MySpace, Bebo, and more.
Search outside a computer
Web-based applications are moving beyond basic computers. Yahoo! has recently announced partnerships with Intel and television manufacturers to allow applications that work alongside normal broadcast programming. Imagine searching the latest Twitter stream for opinions and statistics while watching the Super Bowl.
This new application standard may also be extended to web-enabled household appliances and automotive computers, as well as home entertainment systems.
- Yahoo! BOSS: Developer.Yahoo.Com/BOSS
- YQL: Developer.Yahoo.Com/YQL
- Fire Eagle: Developer.Yahoo.Com/FireEagle
- Google App Engine: AppEngine.Google.Com
- Amazon Web Services: AWS.Amazon.Com
- OAuth: OAuth.Net
- OpenSocial: OpenSocial.Org
- Open Data: TheInfo.Org
- Alt Search Engines: AltSearchEngines.Com
Web Developer, Yahoo! Paris