
This section lists various guidelines for designing custom data services and the XSLT code that backs them.
If you own the site you are building an application for, custom data services are great a way to create prototype applications. Once the prototype proves successful, you should consider providing your data using embedded microformats, embedded RDF, or DataRSS feeds. For more information, refer to Chapter 3, Site Owner Guide.
If you do not own the site you are building an application for, custom data services can be very useful, because they allow you to extract data that is otherwise hard to use programmatically. However, if the site already exposes the data you need in the form of semantic web data or a DataRSS feed, you should use this faster, pre-cached data instead.
If your XSLT makes use of the same data more than once, put it
in a variable, such as $haystack:
<xsl:variable name="haystack" select="//div[@id='test']/@rel"/>
To reliably check for a class or rel
attribute value (which can be in a whitespace-separated list), use
an XPath expression that resembles:
contains(concat(' ', normalize-space($haystack), ' '), ' needle ')
Note the spaces before and after in the string ' needle
'. This expression matches whether the attribute node in
$haystack has "needle", " needle
", "thread needle", "needle
thread", etc. The normalize-space XPath function
takes care of cases where the attribute might contain tabs or
newlines.
For better performance, avoid // whenever
possible. For example, instead of
//meta[@name='title'], use
/html/head/meta[@name='title'].
Think about how the underlying page might change in the future
and use defensive coding to help preserve your service's
functionality. For example, instead of
/html/body/div[3]/div[4], try using
//div[@id='test']. Note that this conflicts with the
advice above. Sometimes being robust means being slower.
When specifying a literal attribute value such as resource, you can use curly braces around an XPath expression.
The hard way:
<item rel="dc:title"> <xsl:attribute name="resource'> <xsl:value-of select="xpath/expression"/> </xsl:attribute> </item>
The easy way:
<item rel="dc:title" resource="{xpath/expression}"/>
Values in the resource attribute must be absolute. If you have only a relative URL available, you can prepend the host portion of the URL like this:
resource="http://host.name.com{$relative_url}"
If the host portion of the URL might change (for example, it
might be
de.
in Germany), use SearchMonkey's built-in host.name.com$CURRURL
variable. This variable contains a string representation of the URL
for the page currently being processed.
If $ns is a node-set:
The expression $ns = "hello" is
true if any node in
$ns equates to the string "hello". If
$ns is empty, the expression returns
false.
The expression $ns != "hello" is
true if any node in
$ns does not equate to the string
"hello". If $ns is empty, the
expression returns false.
The expression not($ns = "hello") is
true if every node in
$ns does not equate to the string
"hello". If $ns is empty, the
expression returns true.
The expression not($ns != "hello") is
true if every node in
$ns equates to the string "hello". If
$ns is empty, the expression returns
true.
For more useful XSLT tips and tricks, refer to Dave Pawson's XSLT Questions and Answers.