0

table max size

hi guys, i have a request.
i have made a very simple open data table. here's the url: http://onalimonalim.altervista.org/tesi/TABLEdblp.xml

so the problem is that the table, whose url is in the <url> tab in the open data table, could not be invoked, because it seem to be too big (almost 700MB)
so the question is: is there a maximum size of the table invoked that must be respected?
in a part of the forum i have read that it's 1,5 MB: it is true?

someone suggests me the problem is in the filtering, because i have not a remote filtering... so i tried a remote filter writing for example (0,10) after the FROM part of the query.. but the problem remains!!! the same "someone" says to me this kind of filter actually isn't a real remote filter, it's another local filter for the client who makes the query and for this reasons the system doesn't work.. it's true?

i know, there many question but it's very important!! please help me! thanks

by
5 Replies
  • QUOTE (Samuele @ Mar 30 2010, 06:38 AM) <{POST_SNAPBACK}>
    hi guys, i have a request.
    i have made a very simple open data table. here's the url: http://onalimonalim.altervista.org/tesi/TABLEdblp.xml

    so the problem is that the table, whose url is in the <url> tab in the open data table, could not be invoked, because it seem to be too big (almost 700MB)
    so the question is: is there a maximum size of the table invoked that must be respected?
    in a part of the forum i have read that it's 1,5 MB: it is true?

    someone suggests me the problem is in the filtering, because i have not a remote filtering... so i tried a remote filter writing for example (0,10) after the FROM part of the query.. but the problem remains!!! the same "someone" says to me this kind of filter actually isn't a real remote filter, it's another local filter for the client who makes the query and for this reasons the system doesn't work.. it's true?

    i know, there many question but it's very important!! please help me! thanks


    So I replied yesterday to another post about a similar problem.

    The XML document you are trying to use in YQL is too large. We currently have a limit of 1.5Mb for a single document fetch, and your remote XML file is obviously much bigger than that. You cannot apply "remote" filtering because you just have a single large file (there is no way of getting your server with the file to filter it down and give a smaller number of results). I'm currently exploring a couple of solutions for large static files of data but there isnt anything in place today to solve this easily.

    There are a number of things you could do, but none of them ideal.

    The easiest may be to generate multiple XML files for the DB and use more than one table to load the files as needed. For example, try and get all the information for a table into a single XML file. If thats too big you could try and put less information into that file. If THATs too big then you could create ONE file per entity ID in the DB, and another file that is the "index" (and then join either explicitly on the command line in YQL or in another helper table that will merge the results in).

    If the DB is pretty "flat" then you could split it across multiple XML files again, and build a table that uses "paging" to load up the right file depending on what page was asked for.

    Jonathan
    0
  • thanks but i have another question: is possible to solve the problem using javascript? i know, we have a simple xml without any kind of API, but exists a method or a way using js, without splitting our enormuos XML file?
    0
  • QUOTE (Samuele @ Mar 31 2010, 12:42 AM) <{POST_SNAPBACK}>
    thanks but i have another question: is possible to solve the problem using javascript? i know, we have a simple xml without any kind of API, but exists a method or a way using js, without splitting our enormuos XML file?


    Unfortunately not. Its the enormous XML file thats the issue here.

    Jonathan
    0
  • I am also working on some massive data. I have a 50Mb, 1.7million line CSV. It is geocode data for UK postcodes via the recently opened Ordnance Survey Open Code-Point (see http://maxmanders.co.uk/lab/ukgeocode/ukgeocode.xml). Rather than one large CSV, should I partition the CSVs into e.g. smaller files, and let people select from these smaller tables instead.

    i.e. instead of
    USE http://maxmanders.co.uk/lab/ukgeocode/ukgeocode.xml;
    SELECT lat, lng FROM ukgeocode WHERE postcode = 'EH5 2GJ'

    should I create smaller tables and do e.g.
    USE http://maxmanders.co.uk/lab/ukgeocode/ukgeocode.xml;
    SELECT lat,lng FROM ukgeocode.eh WHERE postcode = 'EH5 2GH'

    Any thoughts/suggestions appreciated.
    0
  • QUOTE (maxmanders @ Apr 8 2010, 11:24 AM) <{POST_SNAPBACK}>
    I am also working on some massive data. I have a 50Mb, 1.7million line CSV. It is geocode data for UK postcodes via the recently opened Ordnance Survey Open Code-Point (see http://maxmanders.co.uk/lab/ukgeocode/ukgeocode.xml). Rather than one large CSV, should I partition the CSVs into e.g. smaller files, and let people select from these smaller tables instead.

    i.e. instead of
    USE http://maxmanders.co.uk/lab/ukgeocode/ukgeocode.xml;
    SELECT lat, lng FROM ukgeocode WHERE postcode = 'EH5 2GJ'

    should I create smaller tables and do e.g.
    USE http://maxmanders.co.uk/lab/ukgeocode/ukgeocode.xml;
    SELECT lat,lng FROM ukgeocode.eh WHERE postcode = 'EH5 2GH'

    Any thoughts/suggestions appreciated.


    Right, thats the approach I think is the only option at the moment. If you know what the "keys" are that will be used to access the CSV files then you can create a table to load up the "right" one as needed. For example, if you can get the data for each 3-letter prefix postcode under 1.5Mb then your example would split the provided postcode, load up the CSV for the "EH5" (for example) and then work through that one for the result. Another approach may be to look at a site like factual.com who host large tables AND provide a nice API to get them in "pieces"

    Jonathan
    0

Recent Posts

in YQL