When car manufacturers introduce a new model year automobile, there are typically two types of release: a truly all new model, built on a new platform with new engines, or a mid-cycle refresh that is really just last years car with a shiny new grille slapped on.
Website releases are like this too. Sometimes, a new look is thrown on top of the same underlying technology; but sometimes you get a complete ground-up rebuilding of all the technology layers. The beta news site that went live recently is the first release of a new framework to be used for future media websites at Yahoo!: news, sports, finance, entertainment, and so on. Our main goal going into this project was clear: build an efficient, extensible framework for Yahoo! software developers while iterating on content and features for Yahoo! News.
One codebase, served up globally
The current Yahoo! News site actually consists of five different regional websites with five different codebases. Each codebase (accidentally) provides a historic snapshot of the evolution of software development at Yahoo! and on the Web. You can imagine the huge maintenance overhead and the range of performance issues this creates. Plus, each regional website looks different and serves up a very different user experience.
The new beta framework provides a single codebase for all regions and languages. It is the foundation for all future content sites. This means that in the near future when the beta rolls out, when you access an article on Yahoo!, whether its sports or finance, the code used to display that content will be the same.
This single codebase is now served out of 6 strategically located colos instead of two. And since our servers will be closer to users, our content will get to them faster.
Modularity is key: our developers create small modular components. What the user sees as a page is really just a collection of these components configured to deliver the right content to the right users.
The first step was to get a lot more strict about design standards. We tried hard to separate the things users care about (the "feel" of a site) from things they don't (3 pixel vs 5 pixel rounded corners). Once we had this in place, reusing HTML across different sites became just a matter of swapping out some CSS.
We also made the decision to really start embracing progressive enhancement. CSS3 is used to handle many of the non-essential design elements such as gradients and rounded corners. Having these elements rendered by the users browser helped reduce the number of images on the page, helping to speed up the experience.
We completely rebuilt how we ingest and store the content that runs our sites. Before this change, the same news story was ingested and stored by tens of different systems. We had to consolidate this to iterate quickly and continually innovate on our sites.
Were now using a workflow-based system to ingest the 50,000 or so news documents we publish each day. The configurable workflows allow us to handle a wide range of content types and formats. By the time data reaches the end of a specific workflow, it has been transformed into a JSON format we call the common content model.
All this data is then pushed to a massive NoSQL data grid. A core goal when designing our data grid was the ability to easily attach new information to any existing piece of content. That means any team within Yahoo! can analyze and enhance the content. Yahoo! scientists are currently using technologies such as PIG and Hadoop to do things like find related clusters of news stories to show our users.
The way we work
Now that weve consolidated all this code and data, we needed to re-engineer how the separate teams work together.
We strictly adhere to CI (continuous integration) principles, where developers constantly check-in their code and integrate with everyone else. It helped us remove the headaches brought on when everyone tries to merge their code after weeks of coding. CI is critical when many developers are working on a single codebase.
We also formalized a collaborative development process for every piece of code we wrote. Its kind of like an open source project within Yahoo!. That means anyone in Yahoo! can submit a patch to anything in our codebase.
The new media framework supports software engineers and editors/content producers. Developer configuration controls technical settings that should not be changed often (such as cache times, API hosts, etc.). Content producers and editors control page layout through a GUI that defines where a component is placed on a page, what it looks like, or what type of content it displays. Flexibility creates efficiency.
As we roll out these codebase changes, were also re-architecting the way software gets built at Yahoo!: incorporating a philosophy of continuous integration, daily check-ins and testing, scrum development methodologies, and rapid releases. Plus, a single codebase removes obstacles to collaboration. Since all the code is created using the same standards and in the same repository, engineers will be able to contribute code to any content site within Yahoo!.
Were building a modern repository of reusable code and cultivating a code with care culture at a ginormous, global scale. Its about time.
Eric PuidokasThis blog post was co-authored by Darin Foster and Eric Puidokas in Media Engineering.