Welcome, guest Sign In

Chapter 7. Caja Support

In this Chapter

Introduction

What is Caja?

Caja is a system that transforms ordinary HTML and JavaScript into a restricted form of JavaScript. The transformation is called "cajoling", and the result is "cajoled script". The cajoled script is then run within a security sandbox created in your browser. This provides a way to safely include arbitrary third-party content on any Web page.

In principle, Caja should be transparent. Most JavaScript behaves the same whether it's run directly or cajoled. However, since Caja is currently evolving and incomplete, there are some noticeable differences.

Caja is an Open Source project sponsored by Google and hosted at Google Code.

Caja Status for This Release

Since Caja is used to transform an application's HTML and JavaScript into a restricted form that prevents malicious applications from doing damage, applications cannot contain arbitrary ActiveX objects, use eval to get around the ActiveX restriction, or use iframes to get around the eval restriction.

Other than that, most JavaScript application elements should work. Our goal is to make Caja as unobtrusive as possible for ordinary applications. However, we're not there yet. Caja still has some rough edges, and you may experience mysterious Caja behavior. This document will describe some of those mysteries in detail.

To get started right away, use a browser that supports console.log(), such as Firefox with the Firebug add-on, IE 8, or Safari 4.x.

Note the following restrictions that apply to this release:

  • Calls to alert() are redirected to console.log().
  • You can't use external scripts or external stylesheets yet. Inline them instead.
  • Complex libraries such as YUI, jQuery, and Prototype might partially work if you inline them, but they are not seamless yet.
  • The document.write method isn't supported. However, innerHTML and many commonly-used DOM interfaces are supported.

Why Do We Need Caja?

When a website wants to include arbitrary third-party content, it needs to consider many potential security problems. One of the harder problems is "drive-by downloads": an attacker inserts malicious HTML that tries to install malware when you view the page.

A typical vector is an <iframe src=...> tag pointing at the attacker's website. Your browser automatically loads the iframe, which runs a script that figures out what browser and extensions you have, then downloads malware targeted specifically at the vulnerabilities known for your system.

The traditional solution to this problem is to aggressively sanitize third-party content by removing iframes, removing scripts, etc. That works well in many cases, but aggressive sanitization makes it difficult to create interesting applications.

Today, we want to allow anybody to create interesting applications that can appear on our site, but we also want to limits our users' exposure to scripts that install malware.

Sanitizing JavaScript is difficult, and that's what Caja is about.

How Does Caja Work?

Caja has two main parts:

  • server-side translator
  • client-side runtime support
The Server-Side Translator

The Caja translator rewrites arbitrary HTML and JavaScript into safe HTML and JavaScript, using white-list security principles, by

  • Removing anything it doesn't understand
  • Removing HTML and CSS that isn't on a white-list
  • Modifying CSS rules, limiting them to a sandbox <div>
  • Transforming JavaScript into forms known to be safe

The JavaScript transformation is the complicated part. It's basically a form of virtualization:

  • Replaces references to real global variables with references to per-sandbox globals
  • Rewrites references to this to prevent access to the real global scope
  • Replaces most JavaScript code with semantically similar code that has runtime checks for security
  • Rejects some JavaScript code early, such as with(obj){...}.

Here's an example transformation. This JavaScript source code:

is cajoled into something like this:

Note

The actual Caja transformation is slightly different. This example has been simplified to make it easier to see what Caja is doing under the hood.

Simple operations on local variables, such as (s4+s5)/2, are left alone.

Some operations on local variables, such as geo.compute(), are rewritten to call $v functions.

References to globals, such as size, are rewritten to call $v functions.

References to this are rewritten to $dis.

For more details about the JavaScript transformation, see the Google Caja page.

The client-side runtime

Cajoled script can't access any real global objects without help, and that's what the Caja runtime system is for. The runtime system creates a useful sandbox environment by importing objects into the sandbox's globals, which is called "outers" or IMPORTS___.

Some of the imported objects are the real thing. For example, IMPORTS___.Array is identical to the browser's Array.

Some of the imported objects are proxies. For example, IMPORTS___.document is a proxy object that exposes a safe subset of the DOM interface.

The proxy function document.getElementById will return objects that are also proxies. You won't get direct access to a real DOM object, but that doesn't matter, because the proxy objects are similar enough to the actual objects .

The runtime system also enforces the Caja security model, by checking that objects and functions were properly tagged before they're used. You can see Caja's internal tagging when you examine objects in a debugger. Most objects will have properties that end with triple-underbar, such as length_canRead___, FROZEN___, etc.

Table of Contents

Copyright © 2010 Yahoo! Inc. All rights reserved. Copyright | Privacy Policy | Terms of Use

Help us continue to improve the Yahoo! Developer Network: Send Your Suggestions