0

Placemaker is tripped over simple characters

Hi,

I've been working with Placemaker for a while and find it an awesome service.

However, I realized that it fails to determine places with simple changes of short texts:

"Yosemite 09" -> returns Yosemite as expected
BUT,
"Yosemite June 09" -> returns nothing!
or,
"Vietnam War" -> returns Vietnam as expected
BUT,
"Vietnam Part II" -> returns nothing!

etc,etc.

You'd expect it to at least return the earlier suggestions, albeit with a lower weight and/or confidence.
But instead, these simple additions are tripping the service entirely.

I have set the confidence level to 0 to make sure it returns every possibility (although it seems that's the default).

Is there a recommended set of characters/expressions that we ought to filter out of documentContent beforehand?

Thanks in advance for you time,
-Ray

by
4 Replies
  • QUOTE (Ray @ Mar 30 2011, 02:07 PM) <{POST_SNAPBACK}>
    Hi,

    I've been working with Placemaker for a while and find it an awesome service.

    However, I realized that it fails to determine places with simple changes of short texts:

    "Yosemite 09" -> returns Yosemite as expected
    BUT,
    "Yosemite June 09" -> returns nothing!
    or,
    "Vietnam War" -> returns Vietnam as expected
    BUT,
    "Vietnam Part II" -> returns nothing!

    etc,etc.

    You'd expect it to at least return the earlier suggestions, albeit with a lower weight and/or confidence.
    But instead, these simple additions are tripping the service entirely.

    I have set the confidence level to 0 to make sure it returns every possibility (although it seems that's the default).

    Is there a recommended set of characters/expressions that we ought to filter out of documentContent beforehand?

    Thanks in advance for you time,
    -Ray


    Thank you for bringing this issue to our attention.

    Placemaker uses a probability model for disambiguating places from non-places. Each place name (such as Yosemite) has a probability of referring to a place (e.g. Yosemite National Park) rather than another use of the word (e.g. Yosemite Sam). Adjacent place names that overlap (such as Yosemite and California) increase the match probability. Some words that commonly precede or follow a place name may adjust the match probability. Sometimes a person or business name contains a place name (e.g. Jack London, Paris Hilton) and we force the probability for these names to zero. If the match probability exceeds a threshold, the place is returned. The confidence level increases the base threshold to reduce false positives.

    In the case of 'Yosemite June 09', the added text lowers the probability that Yosemite refers to a place below the threshold. In the case of 'Vietnam Part ii', the word 'part' lowers the probability that Vietnam refers to a place below the threshold. Sometimes our probability model produces poor probabilities for some places, which appears to be the case in your examples. I'll forward these examples to our developers so they can improve our data. If you have additional examples where Placemaker failed to produce desired results, please let us know!

    Removing words or characters from an input document is not recommended as you would not know which words are significant to Placemaker. If you want Placemaker to ignore some portion of an input document (such as a header or footer), you may remove it before sending the document to Placemaker.

    Eddie Babcock
    Yahoo! Geo Technologies
    0
  • QUOTE (Eddie B @ Mar 31 2011, 02:23 PM) <{POST_SNAPBACK}>
    Thank you for bringing this issue to our attention.

    Placemaker uses a probability model for disambiguating places from non-places. Each place name (such as Yosemite) has a probability of referring to a place (e.g. Yosemite National Park) rather than another use of the word (e.g. Yosemite Sam). Adjacent place names that overlap (such as Yosemite and California) increase the match probability. Some words that commonly precede or follow a place name may adjust the match probability. Sometimes a person or business name contains a place name (e.g. Jack London, Paris Hilton) and we force the probability for these names to zero. If the match probability exceeds a threshold, the place is returned. The confidence level increases the base threshold to reduce false positives.

    In the case of 'Yosemite June 09', the added text lowers the probability that Yosemite refers to a place below the threshold. In the case of 'Vietnam Part ii', the word 'part' lowers the probability that Vietnam refers to a place below the threshold. Sometimes our probability model produces poor probabilities for some places, which appears to be the case in your examples. I'll forward these examples to our developers so they can improve our data. If you have additional examples where Placemaker failed to produce desired results, please let us know!

    Removing words or characters from an input document is not recommended as you would not know which words are significant to Placemaker. If you want Placemaker to ignore some portion of an input document (such as a header or footer), you may remove it before sending the document to Placemaker.

    Eddie Babcock
    Yahoo! Geo Technologies


    Hello Eddie,

    Thank you for your reply. It definitely makes sense

    I have a lot of such examples that I could compile and send them over to you. What is the best way to do that?

    I noticed most of these cases occur with short form text (2-5 words).
    Do you recommend using Yahoo PlaceFinder or Yahoo GeoPlanet for such cases?

    Thanks!
    -Ray
    0
  • Hello Eddie,

    Could you please shed some light as to whether other Yahoo Location APIs are suitable for such cases, i.e. cases with short document size (2-4 words) where Placemaker is more likely to assign lower weight on a location that the minimum threshold?
    e.g. Yahoo PlaceFinder or Yahoo GeoPlanet

    Thanks.
    -Ray
    0
  • QUOTE (Ray @ Apr 11 2011, 02:03 PM) <{POST_SNAPBACK}>
    Hello Eddie,

    Could you please shed some light as to whether other Yahoo Location APIs are suitable for such cases, i.e. cases with short document size (2-4 words) where Placemaker is more likely to assign lower weight on a location that the minimum threshold?
    e.g. Yahoo PlaceFinder or Yahoo GeoPlanet

    Thanks.
    -Ray


    Yahoo! Placemaker was designed to process long documents that contain references to multiple places. It uses context to identify and disambiguate places. Passing short documents deprives Placemaker of the information it needs. Think of Placemaker as a place extraction tool for documents.

    Yahoo! GeoPlanet was designed to process short documents that contain a reference to a single place. It doesn't care as much about context as Placemaker does and has a lower threshold than Placemaker. Think of GeoPlanet as a place recognition tool.

    Yahoo! PlaceFinder was designed to process a query that contains an address or place query. It expects that the entire query is significant, though it sometimes can fill in missing information (such as country and zip code). It has a lower threshold than GeoPlanet. Think of PlaceFinder as an address geocoder.

    Eddie Babcock
    Yahoo! Geo Technologies
    0
This forum is locked.

Recent Posts

in Placemaker General Discussion