This is the beginning of an ongoing series of blog posts on “Managing Big Data”. This series will focus on techniques that Yahoo uses to process large volumes of data, ranging from initial collection of data to the end usage of that data.
Over the last several years there are two important trends that require additional thought when putting together an architecture for a hosted service. At Yahoo!, the ability to analyze and process enormous amounts of data is increasingly important. It’s a foundational layer for improving our consumer experiences and for sharing audience insights with advertisers.
From a technology perspective, the two trends I'd like to focus on are:
1. Batch processing -- the increasing awareness of batch processing and the recent uptick in use of the map/reduce paradigm for that purpose.
2. NoSQL stores – The rise of so called "NoSQL" stores and their use to serve up data to online users (typically inside of the user's request/responseRead More »from Managing Big Data: Architectural Approaches for making batch data available online