Six months ago we started something big with the launch of Yahoo! Mail Beta. Since then a lot has happened. Millions of users have opted in, dozens of improvements have been released and thousands of issues have been reported (and fixed!). All of that has gone into making what we have today: a product ready for prime time.
In this post I want to give an updated look at the technology that has gone into this release. As always theres too much to cram into a single blog post so Ill touch on some of the highlights and particularly the ones that work directly toward our three main goals for the new Yahoo! Mail: Faster, Safer and Easier to use.
The main goal for the new Yahoo! Mail has always been speed and this new version is twice as fast as any of our former products. A huge amount of work has gone into making that a reality. Let me briefly recap my last post on the topic then cover some additional work weve done.
Second, we worked closely with the Yahoo! Cloud team to make use of their HTTP proxy network that has proxy nodes deployed all around the planet. In a nutshell, these proxies provide connection points for Yahoo! Mail users that are closest to them by network distance. They then proxy user requests to the data centers that house their mailboxes over Yahoo!s high-speed network. This can improve transfer speed by 25% and more in some cases. Its particularly impactful in parts of the world with generally poor network connectivity like India and Indonesia.
The next area Ill cover is product architecture. One of our design goals was to minimize expensive HTTP requests, progressively render the page, and parallelize processing on the server.
Negotiating HTTP connections comes with performance overhead, especially on slow network connections. To minimize the impact we make use of a technique called HTTP request streaming. The basic idea: as chunks of data become available on the server they are immediately pushed across a single, persistent HTTP connection. That way, even if the server still has more work to do to satisfy the full request, the browser can begin rendering parts of the page. This minimizes the number of HTTP connections and allows the browser and server to work in parallel.
To parallelize operations on the server side we created a technology called "Pluton" that has since been released as open source. Pluton allows us to run parallel jobs on the server in PHP. For example, if we're fetching from multiple data sources the ones that return first can be handled and sent to the browser. Here's a diagram showing how the page load progression works with Pluton.
These techniques let us make most efficient use of our browser and server processing resources.
Everything Ive covered here is still just scratching the surface. Weve invested thousands of hours into performance and we're just getting started. Speed is top priority for Yahoo! Mail engineers. Right now we're twice as fast as any of our previous products and we're only going to get faster.
Let's face it, the Internet isn't always the safest place. In a given month, Yahoo! Mail SpamGuard, our anti-spam technology, blocks 550 billion spam messages from reaching users inboxes. That's 89% of all emails sent to our servers! It's a war we are fighting every day to keep harmful messages out of your inbox.
A lot of people think of spam as a nuisance but the reality is more sinister. Many of these messages are trying to get you to buy products from fraudulent merchants or trying to trick you into giving out your password or your credit card number. The bottom line is the worst part: it works. Spam is a huge business and that's why we, along with all big email providers, are under constant attack.
Here's the good news: Yahoo! Mail SpamGuard has improved so much that we are delivering 60% less spam to users inboxes than we were 12 months ago. Despite higher numbers of spam delivery attempts the amount making it through to inboxes is lower than ever. We can't get into too much detail (spammers will read this post too) but here are some high-level techniques we use.
- A highly-responsive IP classification system: As spammers continue to send through various untainted IPs with little history and reputation to evade blacklists, Yahoo!s IP classifier calculates an instant assessment of new IPs when they first start sending to Yahoo! Mail, using features such as location and WHOIS/ASN/DNS.
- An innovative method of determining sender and receiver reputations: Yahoo! has deployed a system that evaluates multiple features of an email sender's reputation, as well as the reputation of its intended recipient. Other message-level data is likewise evaluated to provide a determination of whether a message is a legitimate email or spam.
- Continued improvement of our content classifiers: The new URL classifier quickly gauges a never-seen-before domain or URL's spam factor. When we see a surge in a particular URL in our incoming or outgoing mail traffic, the classifier generates an instant classification of it based on various internal and external data.
- Using Hadoop for intensive data crunching: Yahoo! Mail feeds terabytes of traffic, message and feedback data to our Hadoop grid on a daily basis. The data fuels much of our anti-abuse initiatives, including our machine-learning algorithms, reputation engines, and pattern-detection systems.
Bottom line: These improvements plus many more reduced spam reports by 60% in the last 12 months while we kept our false positive report flat. Users of the new Yahoo! Mail will have a more spam-free experience than ever before.
## Easier to use
The third goal of the new Yahoo! Mail was to make it significantly easier to use. Rather than dig into that here or rehash the hundreds of (occasionally heated) design and interaction discussions we've had over the last year I invite you to try it out.
Look forward to hearing your feedback in the comments!