Blog Posts by Yahoo! Developer Network

  • Hadoop2010: ZettaVox Mining & Analysis

    allowFullScreen='true' src='https://s.yimg.com/m/up/ypp/default/player.swf' flashvars='vid=21120923&autoPlay=0'>

    iPod: Download high-resolution version

    ZettaVox is an enterprise content-mining application that combines crawling, extraction, monitoring, and analysis in a unified solution. In this talk, Kitenga CTO and ZettaVox designer Mark Davis will demonstrate ZettaVox's capabilities on clusters and using cloud-based data resources. He will further show how ZettaVox can extend the reach of cluster-based computing solutions built on Hadoop to include commodity supercomputers based on graphical processing units (GPUs) that implement MapReduce formalisms.

    Baycat logo Media Production by BAYCAT, a non-profit community media producer that educates and employs underserved youth and adults in the digital media arts.

    Read More »from Hadoop2010: ZettaVox Mining & Analysis
  • Hadoop2010: Exact Inference in Bayesian Networks

    allowFullScreen='true' src='https://s.yimg.com/m/up/ypp/default/player.swf' flashvars='vid=21080458&autoPlay=0'>

    iPod: Download high-resolution version

    Probabilistic inference is a way of obtaining values of unobservable variables out of incomplete data. One tool for inference and a way to represent knowledge is a "Bayesian Network," where nodes represent variables and edges represent probabilistic dependencies between variables. While a lot of research has been devoted to devising schemes to approximate the solution, Hadoop allows performing exact inference on the whole network.

    Baycat logo Media Production by BAYCAT, a non-profit community media producer that educates and employs underserved youth and adults in the digital media arts.

    Read More »from Hadoop2010: Exact Inference in Bayesian Networks
  • Hadoop2010: Hadoop Frameworks Panel

    allowFullScreen='true' src='https://s.yimg.com/m/up/ypp/default/player.swf' flashvars='vid=21080460&autoPlay=0'>

    iPod: Download high-resolution version

    A number of frameworks and tools have been built on top of Hadoop to make it easier to write applications and manage Hadoop. The panel members consists of experts/developers of such frameworks and tools. The panel members will discuss the problem space and target audience the specific technology addresses, plans for the future enhancements, and what is missing in the overall space.

    Baycat logo Media Production by BAYCAT, a non-profit community media producer that educates and employs underserved youth and adults in the digital media arts.

    Read More »from Hadoop2010: Hadoop Frameworks Panel
  • Hadoop2010: Honu at Netflix

    allowFullScreen='true' src='https://s.yimg.com/m/up/ypp/default/player.swf' flashvars='vid=21080464&autoPlay=0'>

    iPod: Download high-resolution version

    Netflix moved a large portion of their infrastructure to the cloud to meet reliability, scalability, and availability requirements. As the number of instances running in the cloud increase, the standard way of moving log files or loading log events to a database starts saturating the system. Latency/thruput becomes unusable for its operational needs. Honu is the new streaming log collection and processing pipeline in the cloud for Netflix, and it leverages the computational power of Hadoop/Hive to solve its log analytics requirements.

    Baycat logo Media Production by BAYCAT, a non-profit community media producer that educates and employs underserved youth and adults in the digital media arts.

    Read More »from Hadoop2010: Honu at Netflix
  • OpenSocial and the Yahoo! Application Platform

    What does it take to develop an OpenSocial application on Yahoo! Application Platform (YAP)? YAP supports a lot of the OpenSocial standard JavaScript functions listed in YAP OpenSocial documentation.

    This example demonstrates the development process of a simple OpenSocial application that would allow the user to poke and send messages to their friends. The poke will be broadcasted into the Activity stream, and the message will be sent to the specified user’s friends.

    1. Create an Open Application on the Yahoo! Developer Network (YDN):

    2. Download the Gadget XML, and edit the metadata

    Read More »from OpenSocial and the Yahoo! Application Platform
  • Hadoop2010: Mining Billion-node Graphs

    allowFullScreen='true' src='https://s.yimg.com/m/up/ypp/default/player.swf' flashvars='vid=20986762&autoPlay=0'>

    iPod: Download high-resolution version

    Christos Faloutsos, Carnegie Mellon University, discusses patterns, generators, and tools in mining billion-node graphs. He presents a comprehensive list of static and temporal laws, and some recent observations on real graphs (like, for example, "eigenSpokes''). For generators, he describes some recent ones, which naturally match all of the known properties of real graphs. Finally, for tools, he presents "oddBall'' for discovering anomalies and patterns, as well as an overview of the PEGASUS system, which is designed for handling billion-node graphs, running on top of the "hadoop'' system.

    Baycat logo Media Production by BAYCAT, a non-profit community media producer that educates and employs underserved youth and adults in the digital media arts.

    Read More »from Hadoop2010: Mining Billion-node Graphs
  • Hadoop2010: Hive integration – HBase & RCFile

    allowFullScreen='true' src='https://s.yimg.com/m/up/ypp/default/player.swf' flashvars='vid=20986764&autoPlay=0'>

    iPod: Download high-resolution version

    John Sichi and Yongqiang He of Facebook discuss Facebook's recent integration of two related projects in the Hadoop ecosystem: HBase and Hive. This integration gives powerful SQL query capabilities to HBase, and brings the potential for low-latency incremental data refresh to Hive. The talk will go over performance results from initial testing of the integration. Yongqiang will discuss RCFile, which is a columnar storage for Hive. It is already deployed within Facebook, which is in the process of converting old partitions to RCFile. Depending on the data layout, it has resulted in ~20% space savings.

    Baycat logo Media Production by BAYCAT, a non-profit community media producer that educates and employs underserved youth and adults in the digital media arts.

    Read More »from Hadoop2010: Hive integration – HBase & RCFile
  • Hadoop 2010: Disruptive Applications with Hadoop

    allowFullScreen="true" src="https://s.yimg.com/m/up/ypp/default/player.swf" flashvars="vid=20986778&autoPlay=1">

    iPod: Download high-resolution version

    Rod Smith, IBM Fellow and Vice President of the IBM Emerging Internet Technologies organization, talks about how Hadoop is emerging as a disruptive technology that can power new classes of big data applications while integrating with existing middleware infrastructures — applications like monitoring massive datasets, computationally intensive jobs for evidence-based medicine, analysis for fraud detection, and more. Hadoop makes these next-generation solutions feasible, many of which will provide a disruptive advantage for those that implement them. In this presentation you'll hear how IBM is using Hadoop as a platform for applications and see a demonstration of how you can harness the power of this disruptive technology to help leverage the value of big data and big data analytics.

    Baycat logo Media Production by BAYCAT, a non-profit community
    Read More »from Hadoop 2010: Disruptive Applications with Hadoop
  • Hadoop2010: Hadoop for Scientific Workloads

    allowFullScreen="true" src="https://s.yimg.com/m/up/ypp/default/player.swf" flashvars="vid=20986877&autoPlay=0">

    iPod: Download high-resolution version

    Lavanya Ramakrishnan, Lawrence Berkeley National Lab, outlines its science requirements in the use of Hadoop and related technologies, such as HBASE. She presents a performance comparison of a bioinformatics application using Hadoop on commercial cloud platforms such as Amazon EC2, Yahoo! M45 with a high performance computing system. She present experiences and performance results from local Hadoop and HBASE installation with different file system and scheduling configurations specifically suited for scientific applications.

    Baycat logo Media Production by BAYCAT, a non-profit community media producer that educates and employs underserved youth and adults in the digital media arts.

    Read More »from Hadoop2010: Hadoop for Scientific Workloads
  • Hadoop2010: Hadoop and Pig at Twitter

    allowFullScreen="true" src="https://s.yimg.com/m/up/ypp/default/player.swf" flashvars="vid=20889988&autoPlay=1">

    iPodDownload high-resolution version

    Apache Pig is a high-level framework built on top of Hadoop that offers a powerful yet vastly simplified way to analyze data in Hadoop. It allows businesses to leverage the power of Hadoop in a simple language readily learnable by anyone that understands SQL. In this presentation, Twitter's Kevin Weil introduces Pig and shows how it has been used at Twitter to solve numerous analytics challenges that had become intractable with a former MySQL-based architecture.

    Baycat logo Media Production by BAYCAT, a non-profit community media producer that educates and employs underserved youth and adults in the digital media arts.

    Read More »from Hadoop2010: Hadoop and Pig at Twitter

Pagination

(91 Stories)