0

Decentralized datatables

Hello,

IMHO having one centralized server for data tables definitions is big bottle-neck of YQL. In sence of availability and also as Yahoo will have absolute power of which tables will be supported and which not. (yes, I know everybody can add "use myURL" query to add his own data table, but it makes work harder)

I think it would be great to give users a way to auto load and find needed data table definitions, and I have two ideas how to do it:
1/ DNS: I will prepare data table named "com.myCompany.myProduct.myTable". Then when anyone will query this table, YQL server won't know it's xml definition URL. So it will ask DNS for NAPTR or TXT record of domain "_yql.myTable.myProduct.myCompany.com", which will return the desired URL with xml table definition. Record type is subject of disscussion, since I dont know 'em really good.

2/ Using well-known locations as defined in http://tools.ietf.org/html/draft-nottingham-site-meta-05 - for example table mentioned above, the YQL server could optionally search the url: "http://myProduct.myCompany.com/.well-known/yql", where will be the ENV file (like the actual http://datatables.org/alltables.env)


Both this ideas can be implemented, e.g. YQL server will first use (1) and if it fails, then will try (2). This will allow for admins to setup YQL tables in DNS, where I think it belongs (because it's Domain Name Service, which is supposed just for this reason: to convert name (table name) to address (url address of xml definition)). And it'll be also available for those users, who can't edit DNS records (e.g. freehostings) - they'll create a file in their web root.


What do you think about it? Any comments or opinions are welcome.
All the ideas are subject of discussion.

by
1 Reply
  • QUOTE (juzna.cz123 @ Apr 12 2010, 10:02 AM) <{POST_SNAPBACK}>
    Hello,

    IMHO having one centralized server for data tables definitions is big bottle-neck of YQL. In sence of availability and also as Yahoo will have absolute power of which tables will be supported and which not. (yes, I know everybody can add "use myURL" query to add his own data table, but it makes work harder)

    I think it would be great to give users a way to auto load and find needed data table definitions, and I have two ideas how to do it:
    1/ DNS: I will prepare data table named "com.myCompany.myProduct.myTable". Then when anyone will query this table, YQL server won't know it's xml definition URL. So it will ask DNS for NAPTR or TXT record of domain "_yql.myTable.myProduct.myCompany.com", which will return the desired URL with xml table definition. Record type is subject of disscussion, since I dont know 'em really good.

    2/ Using well-known locations as defined in http://tools.ietf.org/html/draft-nottingham-site-meta-05 - for example table mentioned above, the YQL server could optionally search the url: "http://myProduct.myCompany.com/.well-known/yql", where will be the ENV file (like the actual http://datatables.org/alltables.env)


    Both this ideas can be implemented, e.g. YQL server will first use (1) and if it fails, then will try (2). This will allow for admins to setup YQL tables in DNS, where I think it belongs (because it's Domain Name Service, which is supposed just for this reason: to convert name (table name) to address (url address of xml definition)). And it'll be also available for those users, who can't edit DNS records (e.g. freehostings) - they'll create a file in their web root.


    What do you think about it? Any comments or opinions are welcome.
    All the ideas are subject of discussion.


    So first, there is NO bottleneck. Tables can live anywhere on the internet, they can be hosted on our cloud storage (or anyone else's thats HTTP addressable). They can be grouped AND named in any ways that developers would like using the "env" query parameter. They can be centralized or decentralized. Currently github is being used to collect the ones that people want to share together. Lots of people have their own collections of tables. It really isnt much hassle to "use" a table or an "env".

    Secondly, I dont believe that "trading" or "late binding with auto discovery" makes too much sense for YQL. While YQL does its best to abstract away the nuances and complexities of the underlying APIs, the APIs themselves are nevertheless distinct and require some insight into how they are going to be used and the shape of the data they return.

    So I think your post comes down to "discovery" again.

    Enabling an API provider to put something on their domain is something that could definitely work, like it does for robots.txt. At the moment MOST tables are created by people NOT at these companies. As this changes some way of discovering tables etc from the root of a domain is an interesting idea.

    Finally, web search solved the issue of finding things anywhere on the internet pretty well.

    Jonathan
    0

Recent Posts

in YQL