The week began with a big winter storm that dumped snow on the mountains here. I explored a curated collection of big data links from Adam Flaherty on O'Reilly Answers. The list includes a video introduction by O'Reilly's data guy and director of market research, Roger Magoulas, who believes that deft handling of big data is the core competence of the information age, one that creates a key competitive advantage. (Think of the success of Google or the election of Barack Obama.)
How big is big? Big data might be measured in gigabytes or petabytes -- it depends. Data is big when you have to think about how to manage, store, and extract meaning and value from it so that it doesn't overwhelm your systems. Think of the proliferation of sensor networks, self-expression, informatics, geodata -- and all the other voluntary and involuntary means of amplifying and collecting data about ourselves and our activities on and off the internets.
In a post from November 30 titled The Climate Modeling Leak: Code and Data Generating Published Results Must be Open and Facilitate Reproducibility, statistics PhD and post-doc legal scholar Victoria Stodden reminds us that big data doesn't necessarily come with big reliability or reproducibility.
The focus of Stodden's work is "changes to the scientific method arising from the pervasiveness of computation, specifically reproducibility in computational science." She foresees the time when a more open and "mature, computational science will produce routinely verifiable results." In the meantime, Climategate, that perfect storm of data and its discontents, is an issue she can sink her teeth into with some authority.
In a lighter vein, this was the week I discovered Flowing Data, a website subtitled "Strength in Numbers." It's a curiosity shop of infographics created by Nathan Yau, who's working on his PhD in statistics, with a focus on data visualization. His core interests: social data visualization, self-surveillance, and data for non-professionals. You gotta flowchart designed to help with cereal decision-making.
Other topics that caught my eye this week relate to scale and how it changes the way we work, play, and interact. McKinsey consultants James Manyika, Kara Sprague, and Lareina Yee issued a report about collaboration among knowledge workers that suggests we need to measure the quality and quantity of interactions required for example for "the interplay between a company and its customers or partners that results in an innovative product." Consultant-speak aside, their infographic about the 10 types of waste in a collaborative work environment (Exhibit 2) really hit home: divergence, misunderstanding, interpreting, searching, motion, extra-processing, translation, waiting, and misapplication (Demotivator poster material, anyone?)
Do yourself a favor this weekend, and curl up with a few good ideas from the The Times Magazine's 9th Annual Year in Ideas. The one that's getting the most attention on a popular private list I read is called Random Promotions, about meritocracy, randomness, and the Peter Principle in the workplace.
I was equally fascinated by the The Google [PageRank] Algorithm as Extinction Model, and Massively Collaborative Mathematics, about the crowd-sourcing of big brains to solve stubborn math problems. It's nice to see that NYTimes.com has opened each idea for comments from readers. (I would have been interested in seeing a five-star system of user ratings applied to each idea as well.)
Food for thought, my friends, with all the density, gem-like gooey bits, and staying power of holiday fruitcake.
YDN Blog Editor