0

are there any text-parsing functions?

Are there any functions to parse non-XML strings that may be used to cut and modify text content? I want to write a widget that displays information from a website that is written in normal HTML. For that, I need to know functions for replacing substrings or stripping a given string into an array everytime a specific substring occurs.

by
5 Replies
  • QUOTE (Brigitte @ Feb 5 2010, 10:33 AM) <{POST_SNAPBACK}>
    Are there any functions to parse non-XML strings that may be used to cut and modify text content? I want to write a widget that displays information from a website that is written in normal HTML. For that, I need to know functions for replacing substrings or stripping a given string into an array everytime a specific substring occurs.

    You mean like regular expressions? That's standard JS stuff. Or find a JS HTML parser library online. HTML isn't that easy to parse, so it might be best to find a HTML parser library, unless your doing simple stuff.
    0
  • QUOTE (Steve @ Feb 5 2010, 11:10 AM) <{POST_SNAPBACK}>
    You mean like regular expressions? That's standard JS stuff. Or find a JS HTML parser library online. HTML isn't that easy to parse, so it might be best to find a HTML parser library, unless your doing simple stuff.


    What I want to do is to make a widget that displays train and bus departures of selected stations. The public transport timetables are not copy-righted and the data is used by many applications and other (normal) widgets but while the WDK does not support SQL, the only way to fetch the data is parsing the output of CGI calls to be usually displayed with a browser. Parsing should not be very difficult in this case. If I would use PHP, I would know what to do...
    0
  • QUOTE (Brigitte @ Feb 5 2010, 11:28 AM) <{POST_SNAPBACK}>
    What I want to do is to make a widget that displays train and bus departures of selected stations. The public transport timetables are not copy-righted and the data is used by many applications and other (normal) widgets but while the WDK does not support SQL, the only way to fetch the data is parsing the output of CGI calls to be usually displayed with a browser. Parsing should not be very difficult in this case. If I would use PHP, I would know what to do...

    Your best option will be to write a web service then in PHP (or whatever) that will parse the timetables, and then your widget can make a request to the PHP service. Have the service return the data in JSON. This way you are limiting any extra parsing and processor time required to parse on the TV.

    Also, what happens if the output of the cgi changes sometime down the road? Then you are going to have to change your widget, and put it through QA which can take months. In the meantime, users are left with a widget that is not working properly.
    0
  • Steve - great recommendation.

    Write a service in PHP and then make the widget talk with it.

    WARNING: We have had projects in the past the relied on HTML parsing with PHP and we spent more time updating/managing the PHP parser than we on any other part of the project. If you decide to head down this path, I would recommend caching the data on your server and updating it on a set schedule. Otherwise they may just block your traffic if you are presenting a hug server load.

    Good luck!
    0
  • I might also suggest taking a look at Yahoo! Pipes and also YQL, especially if the data is available on the web already. Will save you an immense amount of time in comparison to writing and maintaining your own PHP code potentially.
    0

Recent Posts

in Getting Started / Beginners - Yahoo! TV Widgets