June 29, 2011
1,650 attendees from 400 companies across 12 countries descend on Santa Clara, CA to hear the latest on Big Data issues, trends and technology Today we kicked off the fourth annual Hadoop Summit at the Santa Clara Convention Center, bringing together some of the most influential thought leaders in the space, including Yahoo!, Facebook and [...]
November 4, 2010
This initiative lets universities conduct research using Yahoo!’s supercomputing resources — approximately 4,000 processors.
October 28, 2010
FCC hopes to further innovation in accessibilities technologies, across private- and public-section Web developers.
October 8, 2010
We are pleased to announce new details about Yahoo! Search BOSS. Early in 2011, BOSS will transition to a cost-per-query paid model.
August 17, 2010
iPod: Download high-resolution version Oozie v1 is a PDL workflow server engine for Hadoop that enables creating workflow jobs composed of several map-reduce jobs, Pig jobs, HDFS operations, and Java processes. Workflow jobs are monitored as single unit via Web services, a Java API, and/or a Web console. Oozie v1 is in production in Yahoo!, [...]
August 17, 2010
iPod: Download high-resolution version Existing best practices for MapReduce graph algorithms have significant shortcomings that limit performance, especially with respect to partitioning, serializing, and distributing the graph. Jimmy Lin (working with Michael Schatz), University of Maryland, presents three design patterns that address designing scalable graph algorithms, and can be used to accelerate a large class [...]
August 17, 2010
Today we’re making some important announcements on the transition of our Search back-end infrastructure to Microsoft, and how this transition will affect the Search APIs and web services we offer.
August 15, 2010
iPod: Download high-resolution version Many Amazon Web Services customers leverage Hadoop inside Amazon Elastic MapReduce, to solve problems ranging from mining clickstream data for targeted advertising, to scientific applications. In this panel, Amazon Web Services customers will Discuss a diverse set of use cases where Hadoop is being applied todayTalk about the enterprise readiness of [...]
August 13, 2010
iPod: Download high-resolution version Worldwide spam volumes this year are forecast to rise by 30% to 40% compared with 2009. Spam recently reached a record 92% of total email. Spammers have turned their attention to social media sites as well. In 2008, there were few Facebook phishing messages; Facebook is now the second most phished [...]
August 13, 2010
iPod: Download high-resolution version A set-similarity join (SSJ) finds pairs of set-based records such that each pair is similar enough based on a similarity function and a threshold. Many applications require efficient SSJ solutions, such as record linkage and plagiarism detection. This talk studies how to efficiently perform SSJs on large data sets using Hadoop. [...]
August 13, 2010
iPod: Download high-resolution version Hadoop is a powerful platform for data analysis and processing, but many struggle to understand how it fits in with regard to existing infrastructure and systems. A series of common integration points, technologies, and patterns are defined and illustrated in this presentation. Eric Sammer looks at job initiation, sequencing and scheduling, [...]
August 13, 2010
iPod: Download high-resolution version One of the most interesting problems we work on at Yahoo! is to provide the most relevant content to our users. This involves being able to track what are the interests of our users; mining the ever-changing content pool to see what is relevant, popular for our users. There is also [...]
August 13, 2010
iPod: Download high-resolution version Cascalog is an interactive query language for Hadoop with a focus on simplicity, expressiveness, and flexibility intended to be used by Analysts and Developers alike. Cascalog eschews the SQL syntax for a simpler and more expressive syntax based on Datalog. With this added expressiveness, Cascalog can query existing data stores "out [...]
August 13, 2010
iPod: Download high-resolution version LinkedIn runs a number of large-scale Hadoop calculations to power its features — from computing similar profiles, jobs, and companies, to predicting People You May Know recommendations to help users find their professional connections. This talk covers how Hadoop fits into a production data cycle for a consumer-scale social network, including [...]
August 12, 2010
iPod: Download high-resolution version Keith Wiley, University of Washington, talks about parallel distributed image stacking and mosaicing with Hadoop, and reports on his experience implementing a scalable image-processing pipeline for the SDSS database using Hadoop. This multi-Terabyte imaging dataset provides a good testbed for algorithm development since its scope and structure approximate future surveys. His [...]