Home | Index

SearchMonkey Guide

RDF

The Resource Description Framework (RDF) is a standard language for storing metadata about web resources. For example, a web page about the Southern Laughing Tree Frog provides data (information about the frog) along with metadata (information about the page, such as the author's name or a copyright statement). Individual elements on the page can also have metadata. For example, a photo of the frog could provide metadata about the photo's image format or the timestamp when the photo was taken.

The goal of RDF is to provide a common framework for specifying this metadata. Fundamentally, you can decompose any RDF data into one or more triples:

How would you represent RDF information in general? Figure 3.3, “RDF Triples for Joe's Home Page” illustrates an RDF graph for a simple home page.

Figure 3.3. RDF Triples for Joe's Home Page

RDF Triples for Joe's Home Page

To decompose this graph into individual RDF triplets:

  1. The website http://joesmith.org has a foaf:maker (an author). The author doesn't have a unique URI, so the object is a blank node.

  2. The website http://joesmith.org also has a dc:title (a title). The title is a literal value, "Joe's Home Page".

  3. The author of the website http://joesmith.org has a foaf:name (a name). The name is a literal value, "Joe Smith".

  4. The author of the website http://joesmith.org also has a foaf:depiction (an image depiction). The image has a unique URI resource, http://joesmith.org/images/jsmith.png.

  5. The depiction of the author of the website http://joesmith.org has dc:rights (copyright or licensing information). The copyright is a literal value, "Creative Commons Attribution 3.0 Unported".

These sorts of relationships are vital for any software attempting to extract semantic information. A web browser or search engine crawler can easily extract objects and subjects from web pages, but without the predicates to indicate semantic relationships, the meaning is lost. Is the resource http://joesmith.org/images/jsmith.png a picture of Joe Smith? Joe Smith's sister-in-law? Joe Smith's classic space Legos collection? A Southern Laughing Tree Frog?

Of course, you can try to use complicated algorithms to "guess" the nature of the photo. However, RDF is a way to provide that kind of information in the first place. The RDF in Figure 3.3, “RDF Triples for Joe's Home Page” asserts that the image is in fact a depiction (foaf:depiction) of the page author (foaf:maker). If you are designing software to, say, display photographs and other information about people, this kind of information is invaluable.

As with microformats, if the Yahoo! Web Crawler encounters any page with embedded RDF, the crawler extracts the data, indexes it, and provides that information to you, the SearchMonkey developer. But how do users actually embed RDF in their pages? Yahoo! supports approaches: RDFa and eRDF.

RDFa relies on using attributes from the <link> and <meta> elements to embed RDF data in XHTML. Using these attributes in this manner is not strictly valid in vanilla XHTML 1.1, but XHTML's modular nature enables you to extend the XHTML DTD to include RDFa (and other) semantics. As a SearchMonkey developer, it is not strictly necessary to understand how to produce RDFa, but Example 3.3, “Joe's Home Page with RDFa Markup” illustrates how this might be done in practice:

Example 3.3. Joe's Home Page with RDFa Markup

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
          "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/elements/1.1/"
      xmlns:foaf="http://xmlns.com/foaf/0.1/"
      lang="en" xml:lang="en">
<head>
  <title>The Amazing Home Page of Joe Smith</title>
</head>
<body>
  <h1 property="dc:title">Joe's Home Page</h1>
  <div rel="foaf:maker">
    <h2 property="foaf:name">Joe Smith</h2>
    <div rel="foaf:depiction" resource="http://joesmith.org/images/jsmith.png">
      <img src="/images/jsmith.png" alt="Smiling headshot of Joe" />
      <p property="dc:rights">Creative Commons Attribution 3.0 Unported</p>
    </div>
  </div>
</body>
</html>

For more information about RDFa, refer to the RDFa specification or the RDFa Primer.

eRDF is an alternative approach for embedding RDF information. Unlike RDFa, it is possible to embed eRDF in XHTML or HTML. However, eRDF supports a more limited subset of RDF than RDFa. Example 3.4, “Joe's Home Page with eRDF Markup” illustrates how Joe might use eRDF to describe his homepage:

Example 3.4. Joe's Home Page with eRDF Markup

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html lang="en">
<head profile="http://purl.org/NET/erdf/profile">
  <title>The Amazing Home Page of Joe Smith</title>
  <link rel="schema.dc" href="http://purl.org/dc/elements/1.1/" >
  <link rel="schema.foaf" href="http://xmlns.com/foaf/0.1/" >
  <link rel="foaf-maker" href="#joe" >
</head>
<body>
  <h1 class="dc-title">Joe's Home Page</h1>
  <div id="joe">
    <h2 class="foaf-name">Joe Smith</h2>
    <div>
      <a rel="foaf-depiction" href="http://joesmith.org/images/jsmith.png">
        <img src="/images/jsmith.png" alt="Smiling headshot of Joe">
        <span class="dc-rights">Creative Commons Attribution 3.0 Unported</span>
      </a>
    </div>
  </div>
</body>
</html>

For more information about eRDF, refer to the eRDF specification or the eRDF wiki.