Using Hadoop to fight spam – Part 1

We interviewed Mark Risher and Jay Pujara, leaders in the war against spam for Yahoo! Mail. With over 300 million users and billions of mesages, looking for problems or patterns to identify spammers can be a daunting task. Mark and Jay describe how their previous approach using databases quickly ran into scalability limitations as they analyzed data aggregated over a month or more. They explain how Hadoop, with Pig and Streaming, now enables them to slice through billions of messages to isolate patterns and identify spammers. They can now create new queries and get results within minutes, for problems that took hours or were considered impossible with their previous approach. Listen in as Mark and Jay describe their experiences fighting spam: