In recent years, we've seen increased energy put into web extensibility platforms. These platforms let distributed developers collaborate to produce new kinds of interactive features on websites and in the web browser itself. Because these platforms frequently enable data-sharing between multiple distinct organizations, and often sit between two completely different security domains (desktop vs. web), the security and privacy issues that arise are complex and interesting. This post explores some of that complexity: both the current state of platforms that extend the web and their associated security challenges.
Plugins, Extensions, and Mashups: A Primer
The extension platforms of the web today exist at every level of the technology stack. First there are mashup platforms, which include Facebook Apps, Google Gadgets, and YAP Apps, built on Yahoo!'s Application Platform. All these platforms are abstractions that allow developers to embed content within a host site. Often this content has a level of dynamism and interactivity that exceeds what is expected of "content" — so they're called "Applications" or "Gadgets."
A deeper type of extensibility occurs in web replacement technologies. Examples include JavaFX, Adobe Flash, Microsoft Silverlight, and Google Chrome Frame1. These technologies leverage exisiting hooks in browsers to replace native rendering and scripting technologies from the browser vendors with new environments that claim to offer a variety of benefits.
The existence of both replacement technologies and augmentation technologies for the web is made possible by the browser vendors themselves, who generally expose two different ways of extending the browser.
Browser plugins were originally intended to allow 3rd parties to add support to the browser so as to handle new types of content. They have since evolved into a rich means of embedding scriptable code into browsers, code that may (optionally) render in the browser's content area. (Microsoft calls these plugins Content Extensions, while other browser vendors have converged on the NPAPI architecture). A second, much more fragmented means of extending modern browsers is to write a browser extension (referred to as a "Browser Helper Object" by our friends in Redmond). Extensions typically have browser lifetime, as opposed to page lifetime. They also have programmatic access to manipulate the user interface of the browser.
Sandboxing vs. Code Review
In designing an extension system where developers contribute code, there's a key upfront decision that has a very deep effect on the platform — To what extent should plugins be trusted? Should plugin authors be forced to attain the approval of some body of reviewers, or do we instead rely on a software sandbox to mitigate the potential harm that could be done by untrusted code from unknown authors (and hence relax the review requirement)? The former approach can minimize upfront investment in platform development, while the latter can reduce delays in publishing 3rd party code.
In addition to increasing complexity, the sandbox-based approach constrains the creativity of developers who use the platform. The universe of what is possible is pre-defined by the platform authors, which is contrary to the idea of an extension system. Thanks to Atul Varma's recent overview of JetPack development, I now can identify this tension as Jonathan Zittrain's "Generative
At first glance, the platform developer might feel stuck with a binary decision between generativity vs. self service (the ability of plugin authors to publish immediately without oversight or review). A little more thought, however, reveals that in reality there's a spectrum with several interesting intermediate choices. The first hybrid to consider: a sandbox with a set of additional capabilities or permissions that may be requested by code running therein.
Building a Capable Sandbox
In many ways, browser extensions can be considered conceptual peers of the browser: They share direct access to the end user and to the resources of the machine within which they run. Additionally, given the wide target audience of modern extension systems, developers with widely varied levels of experience and grasp of security issues are empowered to author and quickly publish extensions. Providing extensions with system access to permit generativity creates a tension with the requirement that extensions run in a locked-down environment to preserve reviewless publishing and prevent accidental or malicious end-user compromise. We seem to be converging on a capability-based security model, where sandboxed code may explicitly or implicitly3 request enhanced "capabilities" or "permissions." Several of the projects mentioned here have mechanisms which allow an extension to express its required permissions:
- Chrome extensions require explicit expression of capabilities (or "permissions") in a manifest.
- In JetPack, it may eventually be possible to automatically determine the capabilities required by a given JetPack, based on the dependency graph of "superpowers" it uses (see below)
- In BrowserPlus, as in Chrome, permissions are explicitly stated in a manifest.
As different projects attempt to build capabilities-based systems, an interesting opportunity for collaboration arises. Users are going to learn to some extent what "writing to a temporary location on your disk" means. They will also learn more about "using location," and maybe also about "capturing video using a webcam." As new platforms emerge that ask the user more and more questions, it would be beneficial if these platforms shared similar capability lists, perhaps even a similar language or visual vocabulary.
From Capabilities to Meaningful Decisions
While we all seem to agree that an enumerable list of capabilities is a prerequisite to intelligent interface design, it's not clear that we've yet determined how to consolidate this information in a user interface that allows meaningful decisions. This is not for a lack of thought: Over a year ago, Dion Almaer suggested a "nuanced question" combined with good post-decision feedback. Since then, he's summarized ideas from Mozilla folks regarding a "layer [of] social trust on top of the technical security." More recently, there's been some initial discussion of a stop-light style representation of risk, which if correctly applied could distill a complex decision into a series of simple questions, (for example, Do you trust yahoo.com?) which iteratively becomes more ominous as the stakes (and the implied risk) get higher.
Better Responses Through Fewer Questions
One approach employed by BrowserPlus attempts to minimize explicit prompting and leverages mechanisms of implicit consent instead 4 by carefully crafting the API between trusted and untrusted code. One such example is file selection, where the act of navigating an operating system-supplied file picker dialog indicates implicitly that the site should be allowed to read the contents of selected file(s). Another example where this technique might be applied involves webcam access. Rather than asking the user if a site may use the webcam, pop up a window displaying the view from the webcam, allowing the user to capture a picture inside that frame (outside the control of the page), and finally after capturing "share" the capture with the page.
Architecting APIs and UI in this way can somewhat alleviate the need for explicit up-front questions and allow in-context kinesthetic decision making to occur. This approach is not without fault however, as it can also confine the freedom of UI designers. Eliminating needless questions assumes that only by asking fewer questions of higher quality, will we truly empower the user to make meaningful decisions.
Permission Prompts in Practice
In addition to ongoing thought experiments in meaningful user prompting, there are plenty of instances in practice where user questions have been built to varying degrees of success.
Facebook combines "social trust" (the rating) with a generic bit of language about how an app can poke at your stuff:
My Yahoo! takes a language-laden approach to the problem:
BrowserPlus uses a lot of screen real estate to convey a fairly literal translation of capabilities into human language:
Google Gears adds a visual representation of a capability and a prominent display of who is asking:
Language Choice and the Generativity Continuum
The choice of implementation language adds another dimension to the problem of security in web extension systems. As might be expected, native code (C, C++, or Objective C) will always afford maximal access to the system, yielding the ultimate in generativity. Certain features cannot be accessed from a higher level language — some good examples include access to physical position information from a
motion sensor5 or access to a wifi card to extract nearby wifi base stations with associated signal strength. The benefit to working at such a low level is, in turn, this freedom. Ultimately, attempts to introduce sandboxing into an extension system that supports native code run the risk of introducing extra developer facing complexity without giving much in return6. In general, the choice of implementation language can bound where an extension system falls on the generativity continuum:
Case Studies in Extension Security
So far we've enumerated several pertinent decisions that extension platform authors must make:
- Are submissions sandboxed or reviewed?
- What language are extensions written in?
- Is the system capability based? If so, how is this implemented?
Google Chrome's Extensions
Aaron's comments are at 21:30
We're really proud of the fact that in the Google Chrome extension gallery we enable extensions automatically, there's no review period. But we make an exception for NPAPI because when you get to native code a lot of the security mechanisms built into the extension system can no longer apply. Once you have native code running it can modify the registry or make permanent modifications to your system... So we have additional review for NPAPI extensions.
This strategy seems reasonable. One would hope that a lion's share of extensions are developed without reliance on native code. The set of extensions that bundle native code can then be periodically examined. Those which leverage native code to do things that may be interesting to a large number of extensions developers and can be considered for uplifting into the core platform.
One interesting thing that the Chrome model lacks is a means for sharing privileged functionality between extensions without actually merging it into the core extension platform. JetPack, on the other hand, is taking a much more ambitious initial approach to building a platform that can more rapidly evolve. Atul recently summarized the design goals of JetPack's capability-based security model. They aim to "allow anyone to create capabilities that securely expose privileged functionality." The JetPack world might end up looking something like this:
Today's Extensions as Tomorrow's Web
Web extensions today are a dynamic and fascinating area at the intersection of several disciplines. This post attempts to give a taste of some of the considerations and tradeoffs involved in the architecture of such systems. Hopefully I've sparked your interest to learn more or perhaps even join the fray. A renewed interest in more open software platforms is no surprise, and the potential to harness the creativity of developers of the world is inspiring.
One area, however, that dampens the joy of web extension platforms covered here is that they're limited in scope — vendor locked and browser specific. While we'll hopefully see increased (and appropriate) cross-vendor collaboration, it's reasonable to conclude that because "browser extensions" are about extending web browsers — and because different web browsers can have vastly different architectures — browser extension APIs will be slow to converge.
emergent HTML5 standards are made of). A significant obstacle to this conversation is the problem of helping users understand and securly navigate a dynamically evolving web.
The current work being done to secure extension platforms logically extends to web augmentation platforms. The degree to which we can collectively refine these permissioning models may become the key limiting factor in how far we can migrate our current desktop experience to the web.
2. Geolocation was ultimately built into Firefox 3.5, however Geode still serves as a great example of web augmentation — in this case using web extensions as a way to experiment with new potential core features for the web.
3. Whether the request of capabilities is implicit or explicit is an interesting related question. Having a platform that can discover the required capabilities of an extension without the author having to explicitly enumerate them would yield something that's easier to develop for, at the cost of complexity for the platform implementor.
4. The notion of implicit consent isn't new — there's precedent, for instance, in Flash where certain API functions can only be performed from an event handler generated by a user action (mouse or keyboard). The simple idea is that user actions can be a meaningful part of the security system.
5. What is possible without native code is obviously a moving target. Browser-based access to the motion sensor was introduced using BrowserPlus a couple years ago, and subsequently has made it into Firefox 3.6. What's next?
6. The potential efficiency of native code is an exception, and Google Native Client is an example of a sandbox system around native code. The project focuses on enabling the efficiency of native code with the saftey of the web, and additionally wraps and exposes some system resources, such as audio. Another glaring exception might be Apple's sandboxed environment for authoring iPhone Apps, a case where Objective C affords a slightly higher level of abstraction for developers yet still allows native code efficiency — important on a mobile device.