
DataRSS is a specification for conveying structured data. As the common data layer for all SearchMonkey applications, DataRSS is what enables you to build applications using structured data from a great variety of sources. Note that the name DataRSS is a bit unfortunate, since DataRSS XML can be embedded in a variety of feed formats, such as Atom and RSS. For more information, refer to Chapter 3, Site Owner Guide.
Unlike site owners, developers do not need to understand the DataRSS XML format in great detail. However, you do need to understand the basic structure of DataRSS, because all SearchMonkey applications are designed to extract information from DataRSS sources.
Example 2.1, “Example DataRSS” illustrates a short snippet of DataRSS, as it would appear in the developer tool:
Example 2.1. Example DataRSS
<adjunctcontainer>
<adjunct id="smid:{$smid}" version="1.0">
<item rel="dc:subject">
<type typeof="foaf:Person">
<meta property="foaf:name">John Doe</meta>
<meta property="foaf:gender">male</meta>
<item rel="foaf:homepage" resource="http://www.joeisageek.com"/>
<item rel="foaf:mbox" resource="mailto:johndoe@example.org"/>
<item rel="foaf:weblog" resource="http://johnblog.example.org"/>
<item rel="foaf:knows">
<type typeof="foaf:Person">
<y:meta property="foaf:name">Jane Doe</meta>
<y:meta property="foaf:gender">female</meta>
<y:item rel="foaf:mbox" resource="mailto:janedoe@example.org"/>
</type>
</item>
</type>
</item>
</adjunct>
</adjunctcontainer>
The four most important elements are:
< — A group of related
URL metadata corresponding to an individual data service, such as
the Yahoo! Index, semantic web data, a DataRSS feed, or a custom
data service. Each adjunct>
contains zero or more <adjunct> <item>s and
<meta>s.
< — A physical item,
concrete concept, or task. The item><item> element has
a required rel attribute with a property describing the
relationship of this object to the enclosing item. The
item element may also have an optional resource
attribute pointing to the item's URI, if one exists. An item
typically has a nested type element specifying the
type(s) of the item.
For example, an image on a page is an
<item>, with a rel indicating that the item is
the subject of the page (the outermost item element typically has
dc:subject as the value of the rel
attribute). The resource attribute points to the location of the
image. The type element provides the type(s) of the
item.
<adjunct id="smid:{$smid}" version="1.0">
<item rel="dc:subject" resource="http://photosite.com/img.jpg">
<type typeof="media:Image foaf:Image">
<meta property="dc:creator">Joe Smith</meta>
</type>
</item>
</adjunct>
In any DataRSS feed, the <item>
element is optional. If you have only simple assertions to make about the
entire page (and not items within the page), you can embed
<meta> elements directly inside the
<adjunct>.
<type> — A type element always occurs
inside an item and has a single typeof attribute. The value of this
attribute is a space-separated list of classes. If the type element
is nested inside an item element with the resource attribute, the
resource attribute must be repeated on the type element. See the
following example:
<adjunct id="smid:{$smid}" version="1.0">
<item rel="dc:subject" resource="http://photosite.com/img.jpg">
<type typeof="media:Image foaf:Image" resource="http://photosite.com/img.jpg">
<meta property="dc:creator">Joe Smith</meta>
</type>
</item>
</adjunct>
<meta> — A meta elements provides the value
for a property of the parent <item> or
<adjunct>. The <meta> element
has a required property attribute that contains a space separated
list of properties (or a single property), and an optional datatype
attribute that specifies the type of the text content (a number, a
date etc.)
For example, a property="product:listPrice"
indicates that the <meta> is a list price, a
datatype="currency:USD" indicates that the price is in
US dollars, and the value 260.00 indicates the actual price of the
item:
A <meta> element's value should never
be the URL for a resource; use <item rel="rel"
resource="resourceURI"> instead.
The possible values of typeof, rel and property come from
vocabularies. Vocabularies contain names of classes and properties and
describe which properties apply to which classes. The typeof
attribute should contain a space-separated list of classes (or a single
class), while rel and property attributes should contain a space separated
list of properties. (The rel attribute expects properties
that 'relate' objects while the property attribute expects properties that
take a simple value (a string, a number, a date etc.) As seen above, both
classes and properties are identified by a so-called CURIE, a combination
of a prefix and a localname, separated by a ':' (colon). For
example, dc:creator indicates the entity primarily
responsible for making the resource, and it's a property from the dc
(Dublin Core) vocabulary. A vocabulary is essentially a well-defined list
of meanings that people and software applications can rely upon. For a
complete list of vocabularies that SearchMonkey supports, refer to Appendix B, SearchMonkey
Vocabularies.
![]() |
Important |
|---|---|
|
![]() |
Note |
|---|---|
There are some differences between the full DataRSS specification for site owners,
and DataRSS as it is represented in the SearchMonkey developer tool.
For site owners, a DataRSS feed contains namespaced elements and
attributes, and more importantly MUST be a valid Atom
feed. However, within the developer tool you need only be
concerned with manipulating |