Posts by Yahoo! Developer Network

Yahoo! Developer Network () Yahoo! Developer Network is Yahoo!'s central resource for developers and partners. YDN offers developer tools, APIs, web services, and resources.

Yahoo! Kicks off Fourth Annual Hadoop Summit

1,650 attendees from 400 companies across 12 countries descend on Santa Clara, CA to hear the latest on Big Data issues, trends and technology Today we kicked off the fourth annual Hadoop Summit at the Santa Clara Convention Center, bringing together some of the most influential thought leaders in the space, including Yahoo!, Facebook and [...]

m45-feature

M45 Cloud Computing initiative adds 4 top universities

This initiative lets universities conduct research using Yahoo!’s supercomputing resources — approximately 4,000 processors.

fcc-broadbandgov

FCC Hosts Open Developer Day: Accessibility Innovation

FCC hopes to further innovation in accessibilities technologies, across private- and public-section Web developers.

boss_2nd

Coming Soon – Yahoo! Search BOSS V2: A Paid Service with Web, Images, and News

We are pleased to announce new details about Yahoo! Search BOSS.  Early in 2011, BOSS will transition to a cost-per-query paid model.

hadoop-alejandroabdelnur

Hadoop2010: Workflow on Hadoop Using Oozie

iPod: Download high-resolution version Oozie v1 is a PDL workflow server engine for Hadoop that enables creating workflow jobs composed of several map-reduce jobs, Pig jobs, HDFS operations, and Java processes. Workflow jobs are monitored as single unit via Web services, a Java API, and/or a Web console. Oozie v1 is in production in Yahoo!, [...]

hadoop-jimmylin

Hadoop2010: Algorithms in MapReduce

iPod: Download high-resolution version Existing best practices for MapReduce graph algorithms have significant shortcomings that limit performance, especially with respect to partitioning, serializing, and distributing the graph. Jimmy Lin (working with Michael Schatz), University of Maryland, presents three design patterns that address designing scalable graph algorithms, and can be used to accelerate a large class [...]

Important API Updates and Changes

Today we’re making some important announcements on the transition of our Search back-end infrastructure to Microsoft, and how this transition will affect the Search APIs and web services we offer.

hadoop-elasticpanel

Hadoop2010: Amazon Elastic MapReduce Panel

iPod: Download high-resolution version Many Amazon Web Services customers leverage Hadoop inside Amazon Elastic MapReduce, to solve problems ranging from mining clickstream data for targeted advertising, to scientific applications. In this panel, Amazon Web Services customers will Discuss a diverse set of use cases where Hadoop is being applied todayTalk about the enterprise readiness of [...]

hadoop-spamchallenge

Hadoop2010: Winning the Big Data SPAM Challenge

iPod: Download high-resolution version Worldwide spam volumes this year are forecast to rise by 30% to 40% compared with 2009. Spam recently reached a record 92% of total email. Spammers have turned their attention to social media sites as well. In 2008, there were few Facebook phishing messages; Facebook is now the second most phished [...]

hadoop-chenli

Hadoop2010: Efficient Parallel Set-Similarity Joins

iPod: Download high-resolution version A set-similarity join (SSJ) finds pairs of set-based records such that each pair is similar enough based on a similarity function and a threshold. Many applications require efficient SSJ solutions, such as record linkage and plagiarism detection. This talk studies how to efficiently perform SSJs on large data sets using Hadoop. [...]

hadoop-ericsammer

Hadoop2010: Integration Patterns & Practices

iPod: Download high-resolution version Hadoop is a powerful platform for data analysis and processing, but many struggle to understand how it fits in with regard to existing infrastructure and systems. A series of common integration points, technologies, and patterns are defined and illustrated in this presentation. Eric Sammer looks at job initiation, sequencing and scheduling, [...]

hadoop-dougcampbell

Hadoop2010: Online Content Optimization

iPod: Download high-resolution version One of the most interesting problems we work on at Yahoo! is to provide the most relevant content to our users. This involves being able to track what are the interests of our users; mining the ever-changing content pool to see what is relevant, popular for our users. There is also [...]

hadoop-nathanmarz

Hadoop2010: Cascalog Query Language

iPod: Download high-resolution version Cascalog is an interactive query language for Hadoop with a focus on simplicity, expressiveness, and flexibility intended to be used by Analysts and Developers alike. Cascalog eschews the SQL syntax for a simpler and more expressive syntax based on Datalog. With this added expressiveness, Cascalog can query existing data stores "out [...]

hadoop-jaykreps

Hadoop2010: Data Apps & Infrastructure at LinkedIn

iPod: Download high-resolution version LinkedIn runs a number of large-scale Hadoop calculations to power its features — from computing similar profiles, jobs, and companies, to predicting People You May Know recommendations to help users find their professional connections. This talk covers how Hadoop fits into a production data cycle for a consumer-scale social network, including [...]

hadoop-keithwiley

Hadoop2010: Parallel Image Stacking

iPod: Download high-resolution version Keith Wiley, University of Washington, talks about parallel distributed image stacking and mosaicing with Hadoop, and reports on his experience implementing a scalable image-processing pipeline for the SDSS database using Hadoop. This multi-Terabyte imaging dataset provides a good testbed for algorithm development since its scope and structure approximate future surveys. His [...]