Yahoo! Kicks off Fourth Annual Hadoop Summit


1,650 attendees from 400 companies across 12 countries descend on Santa Clara, CA to hear the latest on Big Data issues, trends and technology

Today we kicked off the fourth annual Hadoop Summit at the Santa Clara Convention Center, bringing together some of the most influential thought leaders in the space, including Yahoo!, Facebook and IBM, to collaborate on Big Data issues and share experiences in building, managing and operating relevant real-world applications on Hadoop.

Yesterday, together with Benchmark Capital we announced the formation of Hortonworks. This new independent company was founded based on the Hadoop technology pioneered at Yahoo! and will be led by key architects and core contributors to the open source Apache Hadoop technology.

As our CTO, Raymie Stata has already discussed what an exciting time this is for Yahoo!, Hortonworks and the entire Hadoop community. The formation of Hortonworks will increase investment in the development of Apache Hadoop and will accelerate adoption by making it more robust and easier to install, manage and use for enterprises and technology vendors. This investment will enable Apache Hadoop to meet the growing market demand and become the big data management and analysis platform of choice for the industry.

Our Senior Vice President of the Cloud Platform Group, Jay Rossiter, opened the Hadoop Summit this morning with a keynote about Yahoo!’s development of the next generation of Hadoop applications, the important role that Hadoop plays in Yahoo!’s integrated technology ecosystem and how industry-wide adoption of Hadoop is benefitting the entire community. He also introduced Eric Baldeschwieler, CEO of Hortonworks and former Vice President of software engineering for the Hadoop team at Yahoo!, who also, in a keynote, outlined this plan for the new company.

“Since Yahoo! began our work with Hadoop back in 2005, it has become the epicenter of big data and cloud computing. It’s key in helping companies get value from their data and better manage their businesses”, said Jay Rossiter, Senior Vice President Cloud Platform Group at Yahoo!. “With over 1,600 delegates attending this year’s summit, Hadoop is now mainstream, with an expanding ecosystem. Forming Hortonworks with Benchmark Capital is the natural next step in the evolution of Apache Hadoop and will bridge the technology and knowledge gaps that exist within enterprises, systems integrators and technology vendors, including ISVs, OEMs and service providers.”

Later this morning on the main stage, Facebook will discuss its use of Hadoop to power the Facebook Messages infrastructure and IBM will discuss how they used Hadoop to power their Jeopardy-winning supercomputer, Watson.

There has already been lots of buzz from the community with other announcements at the 2011 Hadoop Summit, including:


HStreaming LLC announced today its launch of the most scalable real-time data processing platform powered by Apache Hadoop. HStreaming will provide real-time complex event processing (CEP), allowing customers to manage their full big-data life cycle on a single platform. HStreaming will also fully integrate with other Hadoop technologies such as Pig, HBase, Zookeper and other distributions such as Cloudera’s.

“HStreaming is solving the challenge of generating massive data volumes that need to be processed correctly and in timely fashion. HStreaming technology leverages MapReduce to provide a scalable real-time processing platform at low cost,” said Jana Uhlig, CEO of HStreaming.


Karmasphere, a Big Data Intelligence company, made several announcements today aimed at helping a spectrum of users be more successful with Hadoop. For developers, Karmasphere is now offering an All-in-One Virtual Appliance for building Hadoop applications, bundling Apache Hadoop, Eclipse and Karmasphere Studio Community to rapidly accelerate the learning curve for Hadoop.
For companies initiating their first big data project, Karmasphere and Think Big Analytics are announcing a partnership to help companies accelerate their unstructured data strategy, infrastructure development and successful deployment.

“At the Hadoop Summit users will be hearing a lot about dynamic new entrants offering Apache Hadoop distributions. But there's another very important aspect for success with Big Data – how are enterprises going to interact and uncover the intelligence in Hadoop?” said Gail Ennis, CEO, Karmasphere. “There's a strategic software layer that sits on top of Hadoop, enabling the analysis of data and providing a comprehensive workspace for data professionals. That’s Karmasphere! It’s another critical piece of the Hadoop ecosystem.”

MapR Technologies, Inc.

MapR Technologies, Inc., today announced significant and innovative breakthroughs in the world of Hadoop big data software, unveiling its innovative and unmatched enterprise-ready software to the industry. The MapR Distribution for Apache Hadoop provides 2 to 5 times performance improvements and brings unprecedented dependability to MapReduce analytics, enabling customers to reduce their required hardware costs by half. The MapR Distribution includes popular open-source community tools and capabilities such as Hbase, Hive, Cascading and Zookeeper, among others, and is available for download here.

MapR Technologies, Inc., also unveiled two editions of its software today: MapR M3, which is free for an unlimited number of nodes, and MapR M5, which offers a robust feature set, including high availability, data protection and 24X7 support.

“We’re happy to join a growing number of commercial vendors that are expanding the Hadoop community,” said John Schroeder, CEO and Co-founder of MapR Technologies. “Today we announced the availability of the MapR Editions of our distribution that provide organizations with an easy, dependable, and fast platform for their Hadoop applications.”

Platform Computing

Platform Computing today announced Platform MapReduce, the industry’s first enterprise-class distributed runtime engine for MapReduce applications. Platform MapReduce delivers the next generation architecture for Hadoop MapReduce applications, offering powerful workload manageability, reliability and resource utilization. Platform MapReduce provides unparalleled flexibility in deployment with full compatibility for Hadoop MapReduce applications and supports multiple data sources and file systems, including the Hadoop Distributed File System (HDFS).

Platform Computing also announced signing the Apache Corporate Contributor License Agreement to provide contributions to the Apache Software Foundation for developing Apache-based, open-source software. For more information visit here.

“Platform is committed to the growth of a robust enterprise-ready Hadoop ecosystem,” said Rohit Valia, Director, Solutions Marketing, Platform Computing. “The Hadoop community is leading the charge in exploring solutions for the management of big data and Platform is excited to bring almost two decades of real world experience managing large-scale distributed workloads to solving these challenges.”

Ventana Research

Newly conducted benchmark research from Ventana Research shows organizations recognize that big data requires new approaches to data and information management. The research findings indicate that Hadoop is already being used in one third of big data environments and evaluated in nearly another fifth. The research also found that Hadoop is additive to existing technologies according to almost two thirds of research participants.

Topping the lists of benefits in Hadoop adoption are newly found capabilities - 87% of organizations using Hadoop report being able to do new things with big data versus 52% of other organizations, 94% perform new types of analytics on large volumes of data, 88% analyze data at greater level of detail. These research statistics already validate the arrival of Hadoop as a key component of organization’s information management efforts. However, challenges remain with over half the organizations indicating some level of dissatisfaction with Hadoop. For more information visit here.

"The recent surge in interest and awareness of Hadoop has created confusion in the market," said David Menninger, Vice President and Research Director, Ventana Research. "We undertook this research to help organizations gain a fact-based perspective on the potential benefits of and the obstacles to successful Hadoop deployments. The research shows that organizations can gain significant benefits from Hadoop and we look forward to sharing the findings."

For more information on Yahoo! and Hadoop, visit the Yahoo! Developer Network site.