Writing a LinkedIn API library using Yahoo! BOSS web search

BOSS is the great new search API from Yahoo! that provides almost everything you need to add search capabilities to your application or website. However the simplicity and ease of use of BOSS can be deceptive. The seemingly basic API can be used creatively for many different types of applications. I wrote a quick hack a few weeks ago that creates a photo mosaic from images found using the BOSS image search. This time I'll explain how to use BOSS to create an API library from a website that uses microformats.

Sau Sheong Chang
Engineering Director, Yahoo! Southeast Asia


For many developers, the promise of open APIs accessing data on various web sites is heaven-sent. Mashups of Internet data and services are now possible where, only a few years ago, they required expensive partnerships and tie-ups. Companies such as Yahoo! and Google now open up their APIs to their core application platform running their most important assets. Case in point: BOSS allowing access to Yahoo!'s valuable search index or Address Book allowing access to the contact lists of Yahoo's hundreds of millions of users. Still, not all sites provide APIs, even if the data is available publicly on their sites. In such cases, BOSS comes in handy.

One of the features in BOSS web search is the ability to filter your searches on a particular domain. When focusing on a domain, BOSS proves very effective in providing specific search on publicly available pages in that domain. Combined with an HTML parser, an interesting synergy can be achieved that simulates the features of a API library. In this article I'll discuss how to use BOSS to create an API library to search LinkedIn public profiles. Note that the copyright on the data retrieved belongs to the site owner (in this case, LinkedIn). The techniques shown here are for learning purposes only and if you wish to use them in a commercial manner you need contact the copyright owner for permission to re-use their data.

LinkedIn is one of the more popular business-oriented social networking sites, focused on business and professional networking. A registered LinkedIn user is able to build and maintain a list of direct and indirect contacts, which are then used to find jobs, people, or business opportunities. LinkedIn users keep a comprehensive profile which they normally used to describe themselves on a business or professional level. This profile can be kept private or be publicly published for all to view.

The library I wrote for this article allows developers to search for publicly available profiles by various attributes of the profile. Specifically the code I present addresses searching by the given name, family name, locality and organization the person has worked in before. I use Ruby; my end product is a Ruby gem. However the same concept can be applied with most programming languages. LinkedIn public profiles use the hResume microformat so I use mofo (http://mofo.rubyforge.org), the Ruby microformat parser.

There's only one class in the whole library (it's that simple) so let's dive straight in.

class Linkedin
def initialize(count=50,boss_id= 'BOSS ID')
@boss_id = boss_id
@count = count
end
end

We define a constructor for this class that initializes the number of results to return for BOSS and also the BOSS ID. The bulk of the processing is within the find method in the Linkedin class:

def find(query={})
q = query.values.join(" ")
url = "http://boss.yahooapis.com/ysearch/web/v1/#{q}?appid=#{@boss_id}&sites=linkedin.com&format=xml&count=#{@count}"
(res = Net::HTTP.get_response(URI.parse(URI.escape(url)))) rescue puts 'Cannot reach to URL'
d =  XmlSimple.xml_in(res.body, { 'ForceArray' => false })['resultset_web']['result']
if d.kind_of? Array
data = d
elsif d.kind_of? Hash
data = [] 

The code looks complicated but is quite readable. The find method takes in a Hash named query and the first thing we do is to create a space delimited string with the values in this Hash. This forms the query string that is sent to BOSS web search. The returned results (in XML) is converted into an array of hashes by XMLSimple and is the raw data that we will work on.

Public profiles are only available under http://www.linkedin.com/pub and http://www.linkedin.com/in so we remove all the unnecessary data first. Then for each site that is retrieved, we get the URL of the site and parse it with the mofo microformat parser. Finally, we clean up the returned results and filter away any data that doesn't fit into the given query.

def filter_off_by_org(query)
@people.delete_if {|person|
if person.experience.kind_of? Array
person.experience.each { |exp|
return false if exp.summary.downcase.include?(query[:org].downcase)
}
elsif person.experience.kind_of? HCalendar
return false if person.experience.summary.downcase.include?(query[:org].downcase)
else
return true
end
}
end
def filter_off_by_locality(query)
@people.delete_if {|person|
locality = person.contact.adr.locality.nil? ? "" : person.contact.adr.locality
true unless locality.downcase.include?(query[:locality].downcase)
}
end
def filter_off_by_name(query)
@people.delete_if { |person|
case
when query[:family_name].nil?
true unless person.contact.n.given_name.downcase.include?(query[:given_name].downcase)
when query[:given_name].nil?
true unless person.contact.n.family_name.downcase.include?(query[:family_name].downcase)
else
true unless person.contact.fn.downcase.include?(query[:family_name].downcase) and
person.contact.fn.downcase.include?(query[:given_name].downcase)
end
}
end

Filtering off results that do not fit into the query is relatively straightforward. In this article I describe 3 types of filters, by name, by locality and by the organizations he has worked for. What it does is obvious -- if the name of the returned result is not the actual profile required, we don't want it. You might wonder how this is possible if I search for 'John Smith' -- I should get only profiles of John Smiths, right? As it turns out, if in your profile you even mentions the name 'John Smith' (say you mention in your profile your ex-boss is John Smith), BOSS will return that result to you. This is not the profile you want, of course, so to remove it we need to apply the name filter. In the same way, we apply a locality filter if locality is specified and an organization filter if a particular organization is specified.

Finally, here's a code snippet that shows how this simple LinkedIn public profiles library can be used:

linkedin = Linkedin.new
people = linkedin.find({:given_name => 'Sau Sheong', :family_name => 'Chang', :locality => 'Singapore'})
people.each { |person|
puts person.contact.fn
puts person.contact.adr.locality
puts person.contact.title
puts person.skills
puts person.summary
# experience
person.experience.each { |exp|
puts "  - #{exp.summary}"
puts "    #{exp.description}"
puts "    (#{exp.dtstart} to #{exp.dtend})"
}
# education
person.education.each {|edu|
puts "  - #{edu.summary}"
puts "    #{edu.description}"
puts "    (#{edu.dtstart} to #{edu.dtend})"
}
}

As you can see BOSS can be used in really powerful ways beyond simple searching. In this instance, I showed how we can use BOSS and mofo to build a simple LinkedIn public profiles API but you can use the same techniques to build similar APIs for other sites. All it takes is some creative thinking.

The code and gem described here are hosted at http://rubyforge.org/projects/ruby-linkedin. Enjoy!