Enhancements on Searching

For enhancing newspaper articles I'm seeing some behaviour of the API that can be improved by some input adjustments.

Normally I would send an entire news article to Yahoo (+- 1000 words), but with the following text enhancements it improves the results:

1. News articles write placenames with a capital letter. Like 'New York' and not 'new york'.
Testing with the following sentence: "Drachten is a village in the Netherlands" without any optimization:

{confidence:9, weight:1, woeId:23424909, name:Netherlands, matchType:0, type:Country, centroid:{latitude:52.1082, longitude:5.32986}}

And now with optimization ("Drachten Netherlands"):

{confidence:10, weight:1, woeId:728824, name:Drachten, Friesland, NL, matchType:0, type:Town, centroid:{latitude:53.1029, longitude:6.09856}}

2. Universities usually have the placename in their title "University of York"
Sending the entire sentence to Yahoo results in 0 results.
But adding the name of the city to the input again results in ("University of York York"):
{confidence:9, weight:1, woeId:23702141, name:University of York, Heslington, England, GB, matchType:0, type:POI, centroid:{latitude:53.9459, longitude:-1.04607}}

There are probably more optimizations and if anyone knows them please share.

A quirk:

"Athene" gives no results and "Athene Griekenland" (Dutch) reports correctly

6 Replies
  • And also it prefers US places above countries, "Georgia is a country"
    {confidence:6, weight:1, woeId:2347569, name:Georgia, US, matchType:0, type:State, centroid:{latitude:32.6783, longitude:-83.223}},
    {confidence:1, weight:1, woeId:2409711, name:Georgia, IN, US, matchType:0, type:Town, centroid:{latitude:38.7121, longitude:-86.5839}},
    {confidence:1, weight:1, woeId:2409712, name:Georgia, KS, US, matchType:0, type:Town, centroid:{latitude:37.6303, longitude:-98.009}},
    {confidence:1, weight:1, woeId:2409713, name:Georgia, LA, US, matchType:0, type:Town, centroid:{latitude:29.8403, longitude:-90.9883}},
    {confidence:1, weight:1, woeId:2409714, name:Georgia, NJ, US, matchType:0, type:Town, centroid:{latitude:40.1862, longitude:-74.2858}},
    {confidence:1, weight:1, woeId:2409715, name:Georgia, TX, US, matchType:0, type:Town, centroid:{latitude:33.7606, longitude:-95.8281}},
    {confidence:1, weight:1, woeId:2409716, name:Georgia, Hartselle, AL, US, matchType:0, type:Suburb, centroid:{latitude:34.4533, longitude:-86.9301}},
    {confidence:1, weight:1, woeId:2409718, name:Georgia, VT, US, matchType:0, type:Town, centroid:{latitude:44.7271, longitude:-73.118}},
    {confidence:2, weight:1, woeId:23424823, name:Georgia, matchType:0, type:Country, centroid:{latitude:42.3115, longitude:43.3658}}

    The country gets a confidence of 2 and the state a confidence of 6

    Therefore I must disable autoDisambiguate for every request I make and thus use more bandwidth (roughly 4 times more on average)
  • 'Schoorl' is not recognized, but 'Schoorl, Netherlands' is. Like Athens there is no other place named like this. So for some reason it is filtering these results
  • By removing all lower case words from the text it sometimes happens that "Nederland Nederland" is in the text. This is just a repetition of the country name, but Placemaker thinks this is a province in the Netherlands called Overijssel.

    {confidence:8, weight:1, woeId:731866, name:Nederland, Overijssel, NL, matchType:0, type:Town, centroid:{latitude:52.7558, longitude:5.96283}}

    It only does this if Netherlands is in Dutch, works correctly in English.
  • Another one:

    "Nieuw Zeeland" (New Zealand) is not recognized in a full text, but "Zeeland" is also a province in the Netherlands, so it should be able to match one of these places. With autoDisambiguate=false and the text "Nieuw Zeeland" it only finds the country and not the province.

    I think I can find a lot more ;)
  • Sorry I just read the last line of the Welcome message and I didn't post all things in seperate items. Will make new ones next time
  • Thanks for your feedback, we will try to correct some of these issues.

    1. "Drachten is a village in the Netherlands"
    When we "merge" two locations we take the number of words between them into account.
    Contrast with "Drachten in the Netherlands" - here we return the merged result.
    We will, however look into why we omit a separate "Drachten" result in your first query.

    2. "University of York" and "Athene"
    -Our statistical evidence suggested that these terms on their own are not significant enough to be a location. They only "merge" when you pass in a disambiguating location next to it. We will look into this.

    3. "Georgia"
    - our statistical evidence gathered indicate that it is mostly used as the US state.

    4. 'Schoorl, Netherlands'
    -Yes, same reason as in 2. above. We will look into it.

    5. "Nederland Nederland"
    -Don't remove lower case entities. You are effectively throwing away evidence, so we find a town called Netherlands in the country Netherlands. What would happen in a made up example: "New clothes sale York" -> you transfor this to "New York" which is what we will match against.

    6. "Nieuw Zeeland"
    -I don't agree: "Nieuw Zeeland" certainly seems a better match than "Zeeland". Autodisambiguate only has disambiguates between places with the same name (in this case only 1, "country"). This happens after we recognised the location part "Nieuw Zeeland".

This forum is locked.

Recent Posts

in Placemaker Enhancement Requests