Hack India: Hyderabad — It’s a Wrap!

Hack India: Hyderabad — It’s a Wrap!

The energy at the 6th edition of Yahoo! Hack in India was electrifying as we counted down to the close of hacking at Yahoo! Hack Hyderabad, 2013. Over…

Internal Hackday Produces Record-Breaking 300 Hacks

Internal Hackday Produces Record-Breaking 300 Hacks

Yahoo! has been hosting internal Hackdays since 2005, and the traditio…

Demand is High for Yahoo! Hack India: Hyderabad

Demand is High for Yahoo! Hack India: Hyderabad

Photo credit to Reid Burke Since 2007, YDN has been hosting amazing Ha…

  • Today we’re making some important announcements on the transition of our Search back-end infrastructure to Microsoft, and how this transition impacts the Search APIs and web services we offer on the Yahoo! Developer Network. We are also sharing specific news about several of our other developer services.

    Over recent years, Yahoo! has made a commitment to developers by opening products, services, and canvases for third-party innovation. This commitment remains unwavering. For example, we recently announced new canvases and APIs as part of our Zynga deal. At the same time, we have to align our developer offerings with our products and strategy.

    Yahoo! Search BOSS

    Search remains critical to Yahoo! and we’re happy to announce that we will continue to offer the BOSS program (Build your Own Search Service). In the not too distant future, BOSS will provide web and image search results from Microsoft along with other search-related services and content from Yahoo!, such as news. In the next

    Read More »from Important API Updates and Changes
  • As you might know, An Event Apart, in association with Microsoft, is currently running a 10K competition, asking developers what they can do in under 10KB. The idea is to show the world the power of new web technologies — and how pretty web apps can be, without being heavy.

    I thought I should take part in the competion. Being a data junkie, I didn't concentrate on using canvas to do a cool visualisation, but instead tried to build a very small interface for a big data set.

    The result is World Info, which shows you information about all the countries in the world in under 5K:

    A video or other embedded content has been hidden. Click here to view it.

    Under the hood, World Info uses YQL and Yahoo! GeoPlanet to get all this information. In essence, this boils down to two statements:

    select name,boundingBox from geo.places.children(0) where
    parent_woeid=1 and placetype="country" | sort(field="name")

    This gives you all the countries on this planet (direct children of the data entry with the WOEID 1, which is Earth), sorted alphabetically.

    As all of them come with

    Read More »from Showing off the world with YQL
  • allowFullScreen='true' src='https://s.yimg.com/m/up/ypp/default/player.swf' flashvars='vid=21120810&autoPlay=0'>

    iPod: Download high-resolution version

    Many Amazon Web Services customers leverage Hadoop inside Amazon Elastic MapReduce, to solve problems ranging from mining clickstream data for targeted advertising, to scientific applications. In this panel, Amazon Web Services customers will

    • Discuss a diverse set of use cases where Hadoop is being applied today
    • Talk about the enterprise readiness of Hadoop
    • Talk about how Amazon Elastic MapReduce addresses some of the key challenges of running Hadoop in a production environment
    • Identify features and solutions that will lead to wider adoption
    Media Production by BAYCAT, a non-profit community media producer that educates and employs underserved youth and adults in the digital media arts.

    Read More »from Hadoop2010: Amazon Elastic MapReduce Panel
  • allowFullScreen='true' src='https://s.yimg.com/m/up/ypp/default/player.swf' flashvars='vid=21232268&autoPlay=0'>

    iPod: Download high-resolution version

    Worldwide spam volumes this year are forecast to rise by 30% to 40% compared with 2009. Spam recently reached a record 92% of total email. Spammers have turned their attention to social media sites as well. In 2008, there were few Facebook phishing messages; Facebook is now the second most phished organization online. Even though Twitter has managed to recently bring its spam rate down to as low as 1%, the absolute volume of spam is still massive given its tens of millions of users. Dealing with spam introduces a number of Big Data challenges. The sheer size and scale of the data is enormous. In addition, spam in social media involves the need to understand very complex patterns of behavior as well as to identify new types of spam. This presentation discusses how data analytics built on Hadoop can help businesses keep spam from

    Read More »from Hadoop2010: Winning the Big Data SPAM Challenge
  • allowFullScreen='true' src='https://s.yimg.com/m/up/ypp/default/player.swf' flashvars='vid=21232234&autoPlay=0'>

    iPod: Download high-resolution version

    A set-similarity join (SSJ) finds pairs of set-based records such that each pair is similar enough based on a similarity function and a threshold. Many applications require efficient SSJ solutions, such as record linkage and plagiarism detection. This talk studies how to efficiently perform SSJs on large data sets using Hadoop. It proposes a 3-stage approach to the problem, to efficiently partition the data across nodes to balance the workload and minimize the need for replication. It reports results from extensive experiments on real datasets, synthetically increased in size, to evaluate the speedup and scaleup properties of the proposed algorithms using Hadoop.

    Baycat logo
    Media Production by BAYCAT, a non-profit community media producer that educates and employs underserved youth and adults in the digital media arts.

    Read More »from Hadoop2010: Efficient Parallel Set-Similarity Joins
  • allowFullScreen='true' src='https://s.yimg.com/m/up/ypp/default/player.swf' flashvars='vid=21232264&autoPlay=0'>

    iPod: Download high-resolution version

    Hadoop is a powerful platform for data analysis and processing, but many struggle to understand how it fits in with regard to existing infrastructure and systems. A series of common integration points, technologies, and patterns are defined and illustrated in this presentation. Eric Sammer looks at job initiation, sequencing and scheduling, data input from various sources (e.g., DBMS, messaging systems), and data output to various sinks (DBMS, messaging systems, caching systems). You will see how integration patterns and best practices can be applied to Hadoop and its related projects. This talk is focused on the suitability and architecture of these integration patterns. Care is taken to not duplicate talks on specific tools that are likely to be covered by other talks.

    Baycat logo
    Media Production by BAYCAT, a non-profit community media
    Read More »from Hadoop2010: Integration Patterns & Practices
  • allowFullScreen='true' src='https://s.yimg.com/m/up/ypp/default/player.swf' flashvars='vid=21232247&autoPlay=0'>

    iPod: Download high-resolution version

    One of the most interesting problems we work on at Yahoo! is to provide the most relevant content to our users. This involves being able to track what are the interests of our users; mining the ever-changing content pool to see what is relevant, popular for our users. There is also content normalizing and de-duping issues to avoid redundancy. To solve all these problems, we make extensive use of Hadoop technology stack in our systems. Using Hadoop, we are able to scale to build models for millions of items, and users in near-real time. We leverage HBase for point lookups/stores of these models. We also use Pig for phrasing our workflows so the map-reduce parallelism is abstracted out of core processing.

    Media Production by BAYCAT, a non-profit community media producer that educates and employs underserved youth and adults in the
    Read More »from Hadoop2010: Online Content Optimization
  • allowFullScreen='true' src='https://s.yimg.com/m/up/ypp/default/player.swf' flashvars='vid=21232266&autoPlay=0'>

    iPod: Download high-resolution version

    Cascalog is an interactive query language for Hadoop with a focus on simplicity, expressiveness, and flexibility intended to be used by Analysts and Developers alike. Cascalog eschews the SQL syntax for a simpler and more expressive syntax based on Datalog. With this added expressiveness, Cascalog can query existing data stores "out of the box" with no data "importing" or "under the hood" configuration necessary. Because Cascalog sits on top of Clojure, a powerful JVM based language and interactive shell, adding new operations to a query is as simple as defining a new function. Cascalog relies on Cascading, a robust data-processing API, for defining and running workflows.

    Media Production by BAYCAT, a non-profit community media producer that educates and employs underserved youth and adults in the digital media arts.

    Read More »from Hadoop2010: Cascalog Query Language
  • allowFullScreen='true' src='https://s.yimg.com/m/up/ypp/default/player.swf' flashvars='vid=21232270&autoPlay=0'>

    iPod: Download high-resolution version

    LinkedIn runs a number of large-scale Hadoop calculations to power its features — from computing similar profiles, jobs, and companies, to predicting People You May Know recommendations to help users find their professional connections. This talk covers how Hadoop fits into a production data cycle for a consumer-scale social network, including some of the technology, infrastructure, and algorithms for calculating tens of billions of predictions in a social graph.

    Media Production by BAYCAT, a non-profit community media producer that educates and employs underserved youth and adults in the digital media arts.

    Read More »from Hadoop2010: Data Apps & Infrastructure at LinkedIn
  • SodaHead integrates Yahoo! logins

    SodaHead is a leading opinions-based online community focused on the day's hottest entertainment, politics, and news discussion topics, exceeding over 6.8 million monthly unique users. As of this week, SodaHead enables users to register or login with their Yahoo! account(s). Here's how it works, according to this Guest Post blog post by SodaHead's Michael Rosen, SodaHead product manager, and
    Michael Kalas, SodaHead senior software engineer and lead third-party API developer.

    Yahoo! registration lets users connect with their Yahoo! contacts on SodaHead, and include them in their social polls. SodaHead then uses Yahoo! Updates to push content to the user's stream when he or she uses the site.

    Sodahead has integrated the Yahoo! Login button (alongside Facebook and Twitter) throughout the site, as well as within its registration/login modal process.

    During registration, SodaHead gives users the opportunity either to merge their existing SodaHead accounts after Yahoo! authentication, or to

    Read More »from SodaHead integrates Yahoo! logins


(1,641 Stories)