Hi,
I've been working with Placemaker for a while and find it an awesome service.
However, I realized that it fails to determine places with simple changes of short texts:
"Yosemite 09" -> returns Yosemite as expected
BUT,
"Yosemite June 09" -> returns nothing!
or,
"Vietnam War" -> returns Vietnam as expected
BUT,
"Vietnam Part II" -> returns nothing!
etc,etc.
You'd expect it to at least return the earlier suggestions, albeit with a lower weight and/or confidence.
But instead, these simple additions are tripping the service entirely.
I have set the confidence level to 0 to make sure it returns every possibility (although it seems that's the default).
Is there a recommended set of characters/expressions that we ought to filter out of documentContent beforehand?
Thanks in advance for you time,
-Ray
Thank you for bringing this issue to our attention.
Placemaker uses a probability model for disambiguating places from non-places. Each place name (such as Yosemite) has a probability of referring to a place (e.g. Yosemite National Park) rather than another use of the word (e.g. Yosemite Sam). Adjacent place names that overlap (such as Yosemite and California) increase the match probability. Some words that commonly precede or follow a place name may adjust the match probability. Sometimes a person or business name contains a place name (e.g. Jack London, Paris Hilton) and we force the probability for these names to zero. If the match probability exceeds a threshold, the place is returned. The confidence level increases the base threshold to reduce false positives.
In the case of 'Yosemite June 09', the added text lowers the probability that Yosemite refers to a place below the threshold. In the case of 'Vietnam Part ii', the word 'part' lowers the probability that Vietnam refers to a place below the threshold. Sometimes our probability model produces poor probabilities for some places, which appears to be the case in your examples. I'll forward these examples to our developers so they can improve our data. If you have additional examples where Placemaker failed to produce desired results, please let us know!
Removing words or characters from an input document is not recommended as you would not know which words are significant to Placemaker. If you want Placemaker to ignore some portion of an input document (such as a header or footer), you may remove it before sending the document to Placemaker.
Eddie Babcock
Yahoo! Geo Technologies