Today and tomorrow, Yahoo! is hosting the second Open Cirrus Summit, attended by cloud computing thought leaders from around the world. Computer scientists from leading technology corporations, world-class universities, and public sector organizations have gathered in Sunnyvale to discuss the future of computer science research in the cloud. The breadth of the research talent is expanding this week, as the School of Computer Science at Carnegie Mellon University officially joins the Open Cirrus Testbed.
The event will feature technical presentations from developers and researchers at Yahoo!, HP, and Intel, along with updates on research conducted on the Testbed by leading universities. Specifically, the Yahoo! M45 cluster, a part of the Open Cirrus Testbed, is being used by researchers from Carnegie Mellon, the University of California at Berkeley, Cornell, and the University of Massachusetts for a variety of system-level and application-level research projects. Researchers from these universities have published more than 40 papers and technical reports based on studies using the M45 cluster in many areas of computer science, with several studies related to Hadoop.
Yahoo! is the worlds largest contributor to Apache Hadoop, and last year we open-sourced a production-ready version of Hadoop, the Yahoo! Distribution of Hadoop. Were especially excited that the global community of researchers using Hadoop has expanded because of the Open Cirrus Testbed.
- Systems-level research projects on the M45 have included:
- pipelining data between map and reduce stages of Hadoop jobs to improve user interaction
- deploying log-analysis techniques to improve performance of Hadoop clusters
- applying RAID techniques to improve the Hadoop distributed file system
- Sample application-level research projects include:
- using the cluster to continuously extract knowledge from Web pages, the Read the Web Project
- experimenting with new natural language processing models
- exploring large-scale graph algorithms
- analyzing Wikipedia group dynamics benchmarking statistical machine translation techniques
- performing large-scale document analysis
- prototyping statistical machine learning algorithms and studying computational sustainability
The Open Cirrus Cloud Computing Testbed is a unique initiative because it enables research beyond the application-level, allowing experimentation with the system software itself. A complete, open-source cloud computing software stack is also emerging from work at Open Cirrus. This stack consists of four layers:
1) Pig, the top layer, is a parallel programming language for expressing large-scale data analysis programs. Pig was designed and developed by researchers at Yahoo! Labs;
2) Hadoop, the layer below Pig, is a distributed file system and parallel execution environment that can run Pig/Map-Reduce programs.
3) Tashi, an Apache Incubator project and the layer below Hadoop, is a cluster management system for managing virtual machines;
4) Zoni, the bottom layer, available in the Apache Incubator within Tashi, is a service that manages VLAN-isolated computer, storage, and networking resources.
Pig, Hadoop, Tashi, and Zoni are all open-source projects available for worldwide cloud computing research and experimentation.
Yahoo! has built one of the worlds largest private clouds and uses the cloud to accelerate innovation and improve its consumer and advertiser experiences. Almost every part of Yahoo! now touches the cloud. All of us involved with Open Cirrus are excited to be at the forefront of cloud computing research and proud of the unique contribution of our testbed and our software stack. We look forward with great anticipation to future research from the Open Cirrus team.
Vice President, Yahoo! Labs and Research Operations, and Head, Yahoo! Academic Relations