
The following tutorial explains how to create an example "Page" style custom data service. In this example, we create a custom data service that extracts hResume microformat data from LinkedIn user profile pages. Before trying this tutorial, you should be familiar with basic SearchMonkey concepts and SearchMonkey's screens for creating custom data services.
As mentioned in “Creating Custom Data Services”, the Yahoo! Search Crawler already extracts microformat data... so why go to the effort of creating a custom data service for extracting hResume? Three main reasons:
The Yahoo! Search Crawler only handles a limited subset of microformats — a subset that regrettably does not include hResume. To use hResume, we must create a custom data service. At some point in the future, the crawler might be upgraded to support hResume, in which case this tutorial custom data service will become obsolete (though perhaps still useful for teaching purposes).
Direct data extraction from HTML only works if the page's structure remains relatively stable. Ideally, you should own the pages you are extracting from. Failing that, pages with highly structured HTML are a good choice. Since LinkedIn is going to apply some effort to embed hResume data, and hResume is a well-defined specification, we should have some confidence that our custom data service should continue to work in the future.
When it comes time to build an example presentation application, it's nice to demonstrate that you can build applications from a variety of data services: the Yahoo! Index, LinkedIn's data feed, and finally, this custom data service.
Once the custom data service is complete, you can continue to “Creating a Presentation Application”, which uses your newly-created data service to enhance search results.
From the main SearchMonkey Applications screen, click Create a new Data Service. SearchMonkey displays “Step 1: Basic Info”.

Enter a Name: "Test LinkedIn Data
Service"
Select a Type of . For a tutorial that explains how to create a custom data service that makes web service API calls, refer to “Creating a (Web Service) Custom Data Service”.
Enter a Description: "A test data
service for LinkedIn. Extracts hResume data directly from profile
pages."
Even if you don't plan to share your data service, the description is still useful for private development. This is particularly true if you end up creating several data services that have similar functions or that trigger on the same URLs. The description should not only indicate which sites the data service triggers on, but what kinds of data it extracts.
Read the Terms of Service if you have not done so already. Click the Terms of Service checkbox.
Click . SearchMonkey saves your changes and displays “Step 2: URLs”.

Specify a Trigger URL Pattern of:
"*.linkedin.com/in/*,*.linkedin.com/pub/*"
This pattern matches all results from LinkedIn that fall under
the /in and /pub directories. We happen to
know that these pages represent individual LinkedIn user profiles —
and that they have hResume data for us to extract.
Specify three test URLs:
http://www.linkedin.com/in/amitkumar
http://www.linkedin.com/in/kevinhaas
http://www.linkedin.com/in/mdubinko
Alternatively, you can click to retrieve ten valid, random test URLs. However, by using these specific values, you can confirm that your results match the results in this tutorial.
Click . SearchMonkey saves your changes and displays “Step 3: Data Extraction”.

Creating a (Page) Custom Data Service
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<adjunctcontainer>
<adjunct id="smid:{$smid}" version="1.0">
<item rel="dc:subject"
resource="{//div[@class='hresume']//div[@class='image']/img/@src}">
<type typeof="media:Photo"/>
</item
<item rel="dc:subject">
<type typeof="vcard:VCard">
<meta property="vcard:fn">
<xsl:value-of select="//div[@class='hresume']//span[contains(@class,'fn')]"/>
</meta>
<meta property="vcard:title">
<xsl:value-of select="//div[@class='hresume']//ul[@class='current']/li"/>
</meta>
</type>
</item>
</adjunct>
</adjunctcontainer>
</xsl:template>
</xsl:stylesheet>
Line 3: Boilerplate Code
"Start matching templates at the root node.Specifies the root
element for extracted data,
<adjunctcontainer>".
Line 4: Specifies an
<adjunct> element to encase your extracted
data.
Line 5: An adjunct may contain zero or more <item> and <meta> elements. You should always set the id attribute to the value "smid:{smid}", which causes SearchMonkey to supply a globally unique ID for you.
Line 4: The media:Photo
resource is a link to an image, and the vcard:fn and vcard:title are
also set to something plausible, a person's full name and a job
title respectively.The optional resource attribute specifies the URI
of the resource that represents the item. In this case, the XPath
expression sets the resource attribute to the photo's URL. The XPath
expression matches the src attribute for an
<image> element within a <div
class="image"> within a <div
class="hresume">.
Line 6: Provides a container
for the person's "business card" data. An <item>
may contain zero or more <item> and
<meta> elements.
Line 8: Describes some data
on the page. A <meta> contains a literal value (actual data
extracted from the page). This particular example sets the property
attribute to vcard:fn, indicating that the
<meta> represents the person's first name. For a
list of acceptable values for the property attribute, consult the
vocabulary specification.
Line 9: A
<xsl:value-of> element that extracts the data specified by the
given XPath expression. In this case, the XPath expression matches a
<span> with a class of fn that is inside a <div
class="hresume">
Click Save and Refresh. SearchMonkey refreshes the Preview Pane, displaying the effects of your data service on the first test URL.

The data appears to be acceptable. The rel:Photo
resource is a link to an image, and the vcard:fn and
vcard:title are also set to something plausible, a
person's full name and a job title repectively.
Click Input and Output to view the module's input HTML and output XML. These links are handy for debugging your data service.
Step through the other test results and verify that the Preview Pane is displaying the expected output.
![]() |
Note |
|---|---|
If there are any problems with the extraction code, the Preview Pane displays a bulleted list of warnings and errors. |
Click Next Step. SearchMonkey saves your changes and displays “Step 4: Confirmation”.

Congratulations, you are done with the tutorial! You may now click Create a new Presentation Application and continue to “Creating a Presentation Application” in order to build a presentation application based on this data service. Otherwise, return to the Application Dashboard.