In this Chapter
Caja is a system that transforms ordinary HTML and JavaScript into a restricted form of JavaScript. The transformation is called "cajoling", and the result is "cajoled script". The cajoled script is then run within a security sandbox created in your browser. This provides a way to safely include arbitrary third-party content on any Web page.
In principle, Caja should be transparent. Most JavaScript behaves the same whether it's run directly or cajoled. However, since Caja is currently evolving and incomplete, there are some noticeable differences.
Caja is an Open Source project sponsored by Google and hosted at Google Code.
Since Caja is used to transform an application's HTML and
JavaScript into a restricted form that prevents malicious applications
from doing damage, applications cannot contain arbitrary ActiveX
objects, use eval to get around the ActiveX restriction, or
use iframes to get around the eval restriction.
Other than that, most JavaScript application elements should work. Our goal is to make Caja as unobtrusive as possible for ordinary applications. However, we're not there yet. Caja still has some rough edges, and you may experience mysterious Caja behavior. This document will describe some of those mysteries in detail.
To get started right away, use a browser that supports
console.log(), such as Firefox with the Firebug add-on, IE 8, or Safari
4.x.
Note the following restrictions that apply to this release:
alert() are redirected to
console.log().
document.write method is subject to
restrictions described in DOM Limitations.
However, innerHTML and many commonly-used DOM
interfaces are supported.
When a website wants to include arbitrary third-party content, it needs to consider many potential security problems. One of the harder problems is "drive-by downloads": an attacker inserts malicious HTML that tries to install malware when you view the page.
A typical vector is an <iframe src=...> tag
pointing at the attacker's website. Your browser automatically loads the
iframe, which runs a script that figures out what browser and extensions
you have, then downloads malware targeted specifically at the
vulnerabilities known for your system.
The traditional solution to this problem is to aggressively sanitize third-party content by removing iframes, removing scripts, etc. That works well in many cases, but aggressive sanitization makes it difficult to create interesting applications.
Today, we want to allow anybody to create interesting applications that can appear on our site, but we also want to limits our users' exposure to scripts that install malware.
Sanitizing JavaScript is difficult, and that's what Caja is about.
Caja has two main parts:
The Caja translator rewrites arbitrary HTML and JavaScript into safe HTML and JavaScript, using white-list security principles, by
<div>The JavaScript transformation is the complicated part. It's basically a form of virtualization:
this to prevent access
to the real global scope
with(obj){...}.
Here's an example transformation. This JavaScript source code:
is cajoled into something like this:
The actual Caja transformation is slightly different. This example has been simplified to make it easier to see what Caja is doing under the hood.
Simple operations on local variables, such as
(s4+s5)/2, are left alone.
Some operations on local variables, such as
geo.compute(), are rewritten to call $v
functions.
References to globals, such as size, are rewritten
to call $v functions.
References to this are rewritten to
$dis.
For more details about the JavaScript transformation, see the Google Caja page.
Cajoled script can't access any real global objects without
help, and that's what the Caja runtime system is for. The runtime
system creates a useful sandbox environment by importing objects into
the sandbox's globals, which is called "outers" or
IMPORTS___.
Some of the imported objects are the real thing. For example,
IMPORTS___.Array is identical to the browser's
Array.
Some of the imported objects are proxies. For example,
IMPORTS___.document is a proxy object that exposes a safe
subset of the DOM interface.
The proxy function document.getElementById will
return objects that are also proxies. You won't get direct access to a
real DOM object, but that doesn't matter, because the proxy objects
are similar enough to the actual objects .
The runtime system also enforces the Caja security model, by
checking that objects and functions were properly tagged before
they're used. You can see Caja's internal tagging when you examine
objects in a debugger. Most objects will have properties that end with
triple-underbar, such as length_canRead___,
FROZEN___, etc.