Chapter 7. Caja Support
In this Chapter
Introduction
What is Caja?
Caja is a system that transforms ordinary HTML and JavaScript into a restricted form of JavaScript. The transformation is called "cajoling", and the result is "cajoled script". The cajoled script is then run within a security sandbox created in your browser. This provides a way to safely include arbitrary third-party content on any Web page.
In principle, Caja should be transparent. Most JavaScript behaves the same whether it's run directly or cajoled. However, since Caja is currently evolving and incomplete, there are some noticeable differences.
Caja is an Open Source project sponsored by Google and hosted at Google Code.
Caja Status for This Release
Since Caja is used to transform an application's HTML and
JavaScript into a restricted form that prevents malicious applications
from doing damage, applications cannot contain arbitrary ActiveX
objects, use eval to get around the ActiveX
restriction, or use iframes to get around the eval
restriction.
Other than that, most JavaScript application elements should work. Our goal is to make Caja as unobtrusive as possible for ordinary applications. However, we're not there yet. Caja still has some rough edges, and you may experience mysterious Caja behavior. This document will describe some of those mysteries in detail.
To get started right away, use a browser that
supports console.log(), such as Firefox with
the Firebug
add-on, IE 8, or Safari 4.x.
Note the following restrictions that apply to this release:
- Calls to
alert()are redirected toconsole.log(). - You can't use external scripts or external stylesheets yet. Inline them instead.
- Complex libraries such as YUI, jQuery, and Prototype might partially work if you inline them, but they are not seamless yet.
- The
document.writemethod isn't supported. However,innerHTMLand many commonly-used DOM interfaces are supported.
Why Do We Need Caja?
When a website wants to include arbitrary third-party content, it needs to consider many potential security problems. One of the harder problems is "drive-by downloads": an attacker inserts malicious HTML that tries to install malware when you view the page.
A typical vector is an <iframe src=...>
tag pointing at the attacker's website. Your browser automatically loads
the iframe, which runs a script that figures out what browser and
extensions you have, then downloads malware targeted specifically at the
vulnerabilities known for your system.
The traditional solution to this problem is to aggressively sanitize third-party content by removing iframes, removing scripts, etc. That works well in many cases, but aggressive sanitization makes it difficult to create interesting applications.
Today, we want to allow anybody to create interesting applications that can appear on our site, but we also want to limits our users' exposure to scripts that install malware.
Sanitizing JavaScript is difficult, and that's what Caja is about.
How Does Caja Work?
Caja has two main parts:
- server-side translator
- client-side runtime support
The Server-Side Translator
The Caja translator rewrites arbitrary HTML and JavaScript into safe HTML and JavaScript, using white-list security principles, by
- Removing anything it doesn't understand
- Removing HTML and CSS that isn't on a white-list
- Modifying CSS rules, limiting them to a sandbox
<div> - Transforming JavaScript into forms known to be safe
The JavaScript transformation is the complicated part. It's basically a form of virtualization:
- Replaces references to real global variables with references to per-sandbox globals
- Rewrites references to
thisto prevent access to the real global scope - Replaces most JavaScript code with semantically similar code that has runtime checks for security
- Rejects some JavaScript code early, such as
with(obj){...}.
Here's an example transformation. This JavaScript source code:
is cajoled into something like this:
Note
The actual Caja transformation is slightly different. This example has been simplified to make it easier to see what Caja is doing under the hood.
Simple operations on local variables, such
as (s4+s5)/2, are left alone.
Some operations on local variables, such
as geo.compute(), are rewritten to
call $v functions.
References to globals, such as size,
are rewritten to call $v functions.
References to this are rewritten
to $dis.
For more details about the JavaScript transformation, see the Google Caja page.
The client-side runtime
Cajoled script can't access any real global objects
without help, and that's what the Caja runtime system is
for. The runtime system creates a useful sandbox
environment by importing objects into the sandbox's
globals, which is called "outers"
or IMPORTS___.
Some of the imported objects are the real thing. For example,
IMPORTS___.Array is identical to the browser's
Array.
Some of the imported objects are proxies. For example,
IMPORTS___.document is a proxy object that exposes
a safe subset of the DOM interface.
The proxy
function document.getElementById will return
objects that are also proxies. You won't get
direct access to a real DOM object, but that
doesn't matter, because the proxy
objects are similar enough to the actual objects .
The runtime system also enforces the Caja security
model, by checking that objects and functions were
properly tagged before they're used. You can see Caja's
internal tagging when you examine objects in a debugger.
Most objects will have properties that end with
triple-underbar, such
as length_canRead___,
FROZEN___, etc.

