
The following tutorial explains how to create an example "Page" style custom data service. In this example, we create a custom data service that extracts hResume microformat data from LinkedIn user profile pages. Before trying this tutorial, you should be familiar with basic SearchMonkey concepts and SearchMonkey's screens for creating custom data services.
As mentioned in “Data Service Types”, the Yahoo! Search Crawler already extracts microformat data... so why go to the effort of creating a custom data service for extracting hResume? Three main reasons:
The Yahoo! Search Crawler only handles a limited subset of microformats — a subset that regrettably does not include hResume. To use hResume, we must create a custom data service. At some point in the future, the crawler might be upgraded to support hResume, in which case this tutorial custom data service will become obsolete (though perhaps still useful for teaching purposes).
Direct data extraction from HTML only works if the page's structure remains relatively stable. Ideally, you should own the pages you are extracting from. Failing that, pages with highly structured HTML are a good choice. Since LinkedIn is going to apply some effort to embed hResume data, and hResume is a well-defined specification, we should have some confidence that our custom data service should continue to work in the future.
When it comes time to build an example presentation application, it's nice to demonstrate that you can build applications from a variety of data services: the Yahoo! Index, LinkedIn's data feed, and finally, this custom data service.
Once the custom data service is complete, you can continue to “Creating a Presentation Application”, which uses your newly-created data service to enhance search results.
From the main SearchMonkey Applications screen, click Create a new Data Service. SearchMonkey displays “Step 1: Basic Info”.

Enter a Name: "Test LinkedIn Data
Service"
Select a Type of . For a tutorial that explains how to create a custom data service that makes web service API calls, refer to “Creating a (Web Service) Custom Data Service”.
Enter a Description: "A test data
service for LinkedIn. Extracts hResume data directly from profile
pages."
Even if you don't plan to share your data service, the description is still useful for private development. This is particularly true if you end up creating several data services that have similar functions or that trigger on the same URLs. The description should not only indicate which sites the data service triggers on, but what kinds of data it extracts.
Read the Terms of Service if you have not done so already. Click the Terms of Service checkbox.
Click . SearchMonkey saves your changes and displays “Step 2: URLs”.

Specify a Trigger URL Pattern of:
"*.linkedin.com/in/*,*.linkedin.com/pub/*"
This pattern matches all results from LinkedIn that fall under
the /in and /pub directories. We happen to
know that these pages represent individual LinkedIn user profiles —
and that they have hResume data for us to extract.
Specify three test URLs:
http://www.linkedin.com/in/amitkumar
http://www.linkedin.com/in/kevinhaas
http://www.linkedin.com/in/mdubinko
Alternatively, you can click to retrieve ten valid, random test URLs. However, by using these specific values, you can confirm that your results match the results in this tutorial.
Click . SearchMonkey saves your changes and displays “Step 3: Data Extraction”.

Remove the contents of the textarea and replace them with the following XSLT stylesheet:
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/"><adjunctcontainer>
<adjunct id="smid:{$smid}" version="1.0">
<item rel="rel:Photo"
resource="{//div[@class='hresume']//div[@class='image']/img/@src}"/> <item rel="rel:Card">
<meta property="vcard:fn">
<xsl:value-of select="//div[@class='hresume']//span[contains(@class,'fn')]"/>
</meta> <meta property="vcard:title"> <xsl:value-of select="//div[@class='hresume']//ul[@class='current']/li"/> </meta> </item> </adjunct> </adjunctcontainer> </xsl:template> </xsl:stylesheet>
|
Boilerplate code — "Start matching templates at the root node." |
|
Boilerplate code — Specifies the root element for
extracted data, |
|
Specifies an |
|
Provides a container for a set of interesting related
data; in this case, metadata about a photo. We set the
The optional |
|
Provides a container for the person's "business card"
data. An |
|
Describes some data on the page. A
|
|
A |
Click Save and Refresh. SearchMonkey refreshes the Preview Pane, displaying the effects of your data service on the first test URL.

The data appears to be acceptable. The rel:Photo
resource is a link to an image, and the vcard:fn and
vcard:title are also set to something plausible, a
person's full name and a job title repectively.
Click Input and Output to view the module's input HTML and output XML. These links are handy for debugging your data service.
Step through the other test results and verify that the Preview Pane is displaying the expected output.
![]() |
Note |
|---|---|
If there are any problems with the extraction code, the Preview Pane displays a bulleted list of warnings and errors. |
Click . SearchMonkey saves your changes and displays “Step 4: Confirmation”.

Congratulations, you are done with the tutorial! You may now click and continue to “Creating a Presentation Application” in order to build a presentation application based on this data service. Otherwise, return to the Application Dashboard.