
You don't need to be familiar with Semantic Web technology in order to start working with SearchMonkey. Basic knowledge of HTML, XSLT and/or PHP will generally suffice for most scenarios. However, you might be interested in learning about the semantic technology underlying SearchMonkey. If you are already experienced with Semantic Web technology, we imagine you also have some additional questions. This FAQ is for you.
Where can I learn more about the Semantic Web?
The Semantic Web FAQ maintained by the World Wide Web Consortium (W3C) is a good place to start: http://www.w3.org/2001/sw/SW-FAQ. The W3C is the organization responsible for maintaining Semantic Web standards such as RDF and RDFa.
What does SearchMonkey have to do with the Semantic Web?
First, SearchMonkey builds on Semantic Web standards. SearchMonkey can make use of metadata embedded inside web pages using either microformats, eRDF or RDFa. This metadata is picked up by the Yahoo! crawler and made available for all SearchMonkey applications. You can also develop data services to extract metadata from Web pages using XSLT or submit metadata directly to the search engine in the form of a data feed. In both cases the engine expects the data in the dataRSS format, which is also based on RDF.
Second, we believe SearchMonkey will greatly contribute to building the Semantic Web. By exposing embedded metadata to developers, SearchMonkey will create the necessary motivation for content publishers to provide metadata using Semantic Web standards. SearchMonkey also makes it easy for developers to create metadata applications, significantly lowering the barriers of entry to the Semantic Web.
What are microformats, eRDF and RDFa? Why would I choose one over the other?
Microformats, eRDF and RDFa are different ways of embedding metadata inside Web pages. They represent different trade-offs in terms of ease of authoring versus expressibility. Microformats are the easiest to write and understand, but may not fill all your metadata needs. In particular, you may not find an appropriate vocabulary to represent your information. eRDF and RDFa allow you to work with any RDF or OWL vocabulary, and create your own vocabulary or reuse existing ones. eRDF is a subset of the full RDF model; for example, you can only make statements about the current page. RDFa offers all the features of RDF, making it the most complex of the three formalisms but also the most powerful one.
As a content provider, you should evaluate all three options for publishing metadata. As a SearchMonkey developer, you don't need to worry about the differences: the Yahoo! crawler takes care of converting all metadata into an RDF-based model so that developers of SearchMonkey applications can work with the data regardless of the format it originally came in.
What microformats are supported? What RDF vocabularies are supported?
The following microformats are supported: hCard, hCalendar, hReview, hFeed and XFN. You may use any RDF or OWL vocabulary. However, if you are publishing data using a custom-made vocabulary, make sure you make the schema definition easy to discover so that others can understand your data and build applications on it. The best approach is to follow the recommendations regarding "cool URIs" and serve both textual and machine processable vocabulary definitions at the locations where your URIs are pointing to. This is explained in more detail at http://www.w3.org/TR/cooluris/.
Do you support OWL?
Yes, but OWL is not treated any differently from any other RDF vocabulary. In particular, no OWL reasoning or validation is performed.
Do you perform any reasoning?
No, no reasoning is performed.
Do you perform any validation?
Your data needs to be syntactically correct, but no validation is performed at the semantic level. To give an example: although the hCard format specifies that the url field should contain a URL, this is not checked in any way. There is some syntactic validation available in the developer tool.
What is dataRSS? How does it differ from RDF? Why a new format for representing RDF data?
DataRSS is an Atom compatible format that encodes metadata using RDFa. Thus all RDF data that can encoded using RDFa can also be represented using dataRSS. Unlike existing RDF serialization formats (RDF/XML, Turtle, N-Triples, etc) the advantage of dataRSS is the ability to carry metadata about the data such as the provenance of the information.
How do I submit RDF data? How do I convert RDF documents to dataRSS?
In order to submit RDF data you need to convert it to the dataRSS format. We provide experimental converters described at http://tech.groups.yahoo.com/group/ysop-siteowners/files/
What do I need to do if I already have a website with microformat data, eRDF or RDFa markup?
If your website is already crawled by the Yahoo! crawler your metadata will be discovered and extracted automatically. To make sure your data is valid, you might want to validate your markup using one of the available online validators.
Does the presence of metadata influence the ranking of results?
No, the presence or absence of metadata does not influence the ranking of search results in any way!
Do you also crawl SPARQL endpoints?
No, not at the moment.
Do you also crawl Linked Data?
No, not at the moment.
Do you also crawl RDF documents linked to
Web pages using the
tag?<LINK>
No, at the moment we do not crawl RDF data linked to Web pages
using the <LINK> tag.
I've added or updated the metadata inside my web page. Can I ask for it be to crawled again?
Yes, you can create a Sitemap for your site, and submit it through Site Explorer.
I would like my metadata to be removed from the search index.
Yes,you can ask Yahoo! to stop crawling your site using the robots.txt protocol.
Do you support GRDDL?
No, at the moment we do not support GRDDL.
Is there an API to access the metadata directly?
No, there is no such API at the moment.
Do you have a question that is not answered here?
Please participate in the Yahoo! Search open platform Siteowners Group!