
DataRSS is a specification for conveying structured data for URLs,
in a SearchMonkey data service or using a conventional feed format. Each
URL in a DataRSS feed has one or more adjuncts. The adjunct is the
fundamental unit of organization in DataRSS-consuming SearchMonkey
applications. Meaning "something alongside", the adjunct element
represents the metadata that goes alongside an actual resource, such as
a product listing, or a product review. The adjunct groups the related
metadata from a particular source, such as hcard data about
the page's owner, or technical data about a photo on the page.
Each adjunct contains one or more <item>s and
<meta>s. An item represents some object or concept in
the real world, while a meta presents a property of a
particular item. The Web page the adjunct is alongside can also have
meta properties with no intervening item elements, as shown
in the upcoming example. Items may contain <meta>s
and other items, while <meta>s only contain literal
values.
A few examples:
A page with a single blog entry or photo would have a single adjunct.
A page with multiple blog postings, a Flickr group, a page of search results, or an email inbox would have multiple items within a single adjunct.
However, even a single-photo Flickr page might have multiple adjuncts after aggregating data from different sources. One source might annotate microformats, another camera data, all on top of the original photo metadata.
For example, here we have a section from an atom feed:
<atom:entry>
<atom:id>http://the.url/in/question</atom:id>
...
<y:adjunct version="1.0" name="com.yahoo.test">
<y:meta property="tagspace:tags">tag 1 tag2 tag3 tag4</meta>
<y:item rel="dc:subject" resource="http://photosite.com/img.jpg">
<y:type typeof="media:Photo">
<y:meta property="dc:creator">The Nameless One</meta>
</y:type>
</y:item>
</y:adjunct>
</atom:entry>
DataRSS is an XML format designed to deliver a wide array of structured data. As the common data layer for all SearchMonkey applications, DataRSS enables you to distribute your structured data to millions of people. The trick is to represent that data as DataRSS in the first place. Yahoo! already provides a great deal of information gathered by the Yahoo! Search Crawler as DataRSS, as do a number of third parties who have already set up DataRSS feeds. For other data sources, you can use SearchMonkey to construct custom data services that transform the source data into valid DataRSS.
SearchMonkey adds, removes, and updates metadata in terms of entire adjuncts, and the system will never break them apart or join them together. Because each adjunct has a unique identifier assigned by the system, there is always a way to refer to a particular adjunct. Therefore, adjuncts can be updated or replaced as a unit as the underlying pages change.
In addition, since each adjunct serves as a container for the metadata and item definitions within, everything that is "said" in the metadata of a particular adjunct is attributable to a particular source. This enables different people and groups to say their own things about any resource.
The ability of site owners to separate metadata into adjuncts gives them flexibility in how their metadata is assembled and represented. Developers making use of the data will be able to "subscribe" to different adjuncts containing the data needed by their application.
For complete details on DataRSS, see the Yahoo! DataRSS Specification.
Search Monkey Applications can make use of data which the user submits to the Yahoo! index. Data is submitted in the form of DataRSS feeds which the user submits through the Site Explorer "Submit a Feed" process.
![]() |
Note |
|---|---|
When using the Site Explorer submit process the user must be authenticated on a site in order to submit a SearchMonkey feed. SearchMonkey feeds are automatically validated by Yahoo! Site Explorer when the feed is submitted. |
DataRSS has four elements that are relevant to SearchMonkey site owners:
<adjunct> A container of metadata associated with a URL. Within the
SearchMonkey developer tool, each data service provides a single
<adjunct> with a unique ID.
Generally, the format of the adjunct is com.yahoo.source.type.value
(where type and source are defined by
the data architect and value is left to the adjunct
owner). This is to enforce consistency across all feed
contributors.
Yahoo! currently uses three adjunct types. They are:
com.yahoo.page.uf - this adjunct supports
microformats which Yahoo! extracts when crawling your site.
An example of this type of adjunct is
com.yahoo.page.uf.hcard
com.yahoo.page.rdf - this adjunct
supports RDF which Yahoo! extracts when crawling your site.
An example of this type of adjunct is
com.yahoo.page.rdf.erdf
com.yahoo.feeds.searchmonkey - this
adjunct supports SearchMonkey publisher feeds submitted
thorough Site Explorer.
Attributes:
name — The adjunct's name. Choose a name
that describes your metadata, such as
"com.website.products."
updated — (Optional) The last updated timestamp
of this adjunct as an ISO 8601 date-time stamp. If
individual entries don't have a last-updated timestamp, the
overall feed must have one, and all entries will be given
the same timestamp.
version — A numeric version string
("1.0") that indicates the version in use for
this particular adjunct. If the adjunct format changes
substantially, you should increment this number.
<meta>A specific metadata assertion for the parent
<adjunct> or <item>. The
<meta> element contains a literal value
specifying the value of the assertion. For example, a listing
for a camera has properties that include the title and the list
price, taken from the dc and product
vocabularies, respectively.
<y:adjunct name="com.website.products" version="1.0">
<y:item rel="dc:subject">
<y:type typeof="product:Product">
<y:meta property="dc:title">Canon PowerShot SD800 IS Digital ELPH Digital Camera</y:meta>
<y:meta property="product:listPrice" datatype="currency:USD">260.00</y:meta>
...
</y:type>
</y:item>
</y:adjunct>
In RDF parlance, a
<meta> element and its property
attribute represent a liternal object and predicate. The
<meta> element's value is always a literal. A
<meta> element's value should never be the URL for a
resource; use <item
rel="
instead.rel"
resource="resourceURI">
Attributes:
property — A CURIE (or
list of CURIEs) specifying the properties which take the
value inside the element. For example,
property="vcard:bday" indicates that the
metadata is a bday (birthday) of the contact, as defined by
the vcard vocabulary.
In RDFa,
this corresponds to an element's property attribute. For
some of the standard properties see the SearchMonkey
vocabularies."
datatype — (Optional) A CURIE
specifying the datatype of a metadata value. Most properties
are strings, but you can use datatype to
specify a more restrictive type, such as
currency:USD. For a list of possible datatype
values, refer to the SearchMonkey Site Owner Guide. In RDFa
, this corresponds to an element's
datatype attribute.
<item>A physical item, concrete concept, or task described by
the feed, with a rel attribute describing the
relationship of this object to the current resource and an
optional resource attribute pointing to the URL that represents
this item. For example, an image on a page is an <item>,
with a rel indicating that the item is a photo and a resource
pointing to the image file. Within the <item>
can be more <item> elements or metadata
assertions.
<y:adjunct id="smid:{$smid}" version="1.0">
<y:item rel="dc:subject" resource="http://photosite.com/img.jpg">
<y:type typeof="media:Photo>
<y:item rel="review:hasReview">
<y:type typeof="review:Review">
<y:meta property="dc:creator">Joe Smith</y:meta>
</y:type>
</y:type>
</y:type>
</y:item>
</y:adjunct>
In RDF parlance, each
<item> element establishes a triple between
the parent item (or adjunct) and another object, possibly a
"blank node", and sets the new object as the current
resource.
![]() |
Note |
|---|---|
The |
Attributes:
rel — A CURIE (or
space-separated list of CURIEs) specifying the relationship
of this object to the current resource, using one or more
properties from a vocabulary. For example,
rel="rel:hasReview" indicates that review is a
review of the photo (the enclosing item). In RDFa, this corresponds to an
element's rel attribute.
resource — (Optional) A URL specifying the web
resource that represents this item. For example, an item
that is a video should have a resource pointing to the
actual video file location. If the item does not have a
corresponding web resource, you can omit
resource. In RDFa, this corresponds to an
element's resource attribute.
![]() |
Important |
|---|---|
|
The type element provides the type(s) of the enclosing element.
Attributes:
typeof — A CURIE (or a space-separated
list of CURIEs) describing the type(s) of the enclosing
element. Types should be classes chosen from the vocabularies.
Example 3.1, “Example DataRSS” illustrates a short example DataRSS feed.
Example 3.1. Example DataRSS
<atom:entry>
<atom:id>http://the.url/in/question</atom:id>
<y:adjunct name="com.website.products" version="1.0">
<y:item rel="rel:Product">
<y:meta property="product:listPrice" datatype="currency:USD">12.99</y:meta>
<y:meta property="product:shippingCost" datatype="currency:USD">0</y:meta>
<y:meta property="product:shippingWeight" datatype="units:g">500</y:meta>
<y:item rel="rel:Review"
resource="http://www.onlinestore.com/reviews/12345/browse"/>
</y:item>
</y:adjunct>
</atom:entry>
The full DataRSS specification is an appendix in this guide. If you read the specification, you will notice some differences between DataRSS as used in feeds and DataRSS as it is produced by XSLT in the SearchMonkey Developer Tool. In SearchMonkey, from a data service,
DataRSS output does not contain namespaced elements and attributes. Within the context of a data service, this extra level of disambiguation is not needed.
DataRSS output uses a wrapper element called <adjunctcontainer> instead of being a payload inside Atom entry elements.
<adjunct> elements have an
id attribute which is always populated by the system.
The name attribute is not allowed.
These differences result from design decisions made to simplify
coding for SearchMonkey developers. Site owners find it useful that
DataRSS piggybacks off of Atom, since they can leverage tools and
knowledge they have about Atom to craft valid DataRSS feeds. However,
the SearchMonkey developer tool is designed specifically around
manipulating <adjunct>s,
<item>s, and <meta>s. Therefore,
site owners don't need to inform the SearchMonkey developer tool about
the namespacing for SearchMonkey-specific elements — this is already
understood.