Home | Index

SearchMonkey Guide

DataRSS Primer for Developers

DataRSS is a specification for conveying structured data. As the common data layer for all SearchMonkey applications, DataRSS is what enables you to build applications using structured data from a great variety of sources. Note that the name DataRSS is a bit unfortunate, since DataRSS XML can be embedded in a variety of feed formats, such as Atom and RSS. For more information, refer to Chapter 3, Site Owner Guide.

Unlike site owners, developers do not need to understand the DataRSS XML format in great detail. However, you do need to understand the basic structure of DataRSS, because all SearchMonkey applications are designed to extract information from DataRSS sources.

Example 2.1, “Example DataRSS” illustrates a short snippet of DataRSS, as it would appear in the developer tool:

Example 2.1. Example DataRSS

<adjunctcontainer>
  <adjunct id="smid:{$smid}" version="1.0">
    <item rel="dc:subject">
      <type typeof="foaf:Person">
            <meta property="foaf:name">John Doe</meta>
            <meta property="foaf:gender">male</meta>
            <item rel="foaf:homepage" resource="http://www.joeisageek.com"/>
            <item rel="foaf:mbox" resource="mailto:johndoe@example.org"/>
            <item rel="foaf:weblog" resource="http://johnblog.example.org"/>
            <item rel="foaf:knows">
                <type typeof="foaf:Person">
                    <y:meta property="foaf:name">Jane Doe</meta>
                    <y:meta property="foaf:gender">female</meta>
                    <y:item rel="foaf:mbox" resource="mailto:janedoe@example.org"/>
                </type>
            </item>
        </type>
    </item>
  </adjunct>
</adjunctcontainer>

The four most important elements are:

<adjunct id="smid:{$smid}" version="1.0">
  <item rel="dc:subject" resource="http://photosite.com/img.jpg">
      <type typeof="media:Image foaf:Image">
          <meta property="dc:creator">Joe Smith</meta>
      </type>
  </item>
</adjunct>

In any DataRSS feed, the <item> element is optional. If you have only simple assertions to make about the entire page (and not items within the page), you can embed <meta> elements directly inside the <adjunct>.

<adjunct id="smid:{$smid}" version="1.0">
  <item rel="dc:subject" resource="http://photosite.com/img.jpg">
      <type typeof="media:Image foaf:Image" resource="http://photosite.com/img.jpg">
         <meta property="dc:creator">Joe Smith</meta>
      </type>
  </item>
</adjunct>

A <meta> element's value should never be the URL for a resource; use <item rel="rel" resource="resourceURI"> instead.

The possible values of typeof, rel and property come from vocabularies. Vocabularies contain names of classes and properties and describe which properties apply to which classes. The typeof attribute should contain a space-separated list of classes (or a single class), while rel and property attributes should contain a space separated list of properties. (The rel attribute expects properties that 'relate' objects while the property attribute expects properties that take a simple value (a string, a number, a date etc.) As seen above, both classes and properties are identified by a so-called CURIE, a combination of a prefix and a localname, separated by a ':' (colon). For example, dc:creator indicates the entity primarily responsible for making the resource, and it's a property from the dc (Dublin Core) vocabulary. A vocabulary is essentially a well-defined list of meanings that people and software applications can rely upon. For a complete list of vocabularies that SearchMonkey supports, refer to Appendix B, SearchMonkey Vocabularies.

[Important] Important

<item> elements have rel attributes and <meta> elements have property attributes. When dealing with DataRSS, take care not to confuse the two.

[Note] Note

There are some differences between the full DataRSS specification for site owners, and DataRSS as it is represented in the SearchMonkey developer tool. For site owners, a DataRSS feed contains namespaced elements and attributes, and more importantly MUST be a valid Atom feed. However, within the developer tool you need only be concerned with manipulating <adjunct>s, <item>s, and <meta>s. Within the developer tool, SearchMonkey abstracts away namespaces and the Atom nature of the feed, since this additional complexity is relevant for site owners, but not developers.