Video published 2009-10-29.
Matt Snider: Today we're going to cover YUI Storage Utility. We're going to cover what we did, why we did it, and how we did it. The URL at the bottom, https://developer.yahoo.com/yui/storage is where you can find a lot of this information about the Storage Utility, and some of what I'm going to talk about today.
So what is the Storage Utility? Well first of all, it's a place you can store a large amount of data client-side. It's a cross browser solution for HTML 5. They introduced us to the new element properties in HTML 5 that allow you to store information on the client-side, but each browser has a little different flavor of it, so we take care of that as one of the features of the Storage Utility. The bigger part is that there are a lot of browsers that do not yet support HTML 5, so we take care of that, as well, for you.
Why did we create Storage Utility? Well, the browser objects are not necessarily easy to use or easy to extend, so our utility allows very simple use of the storage objects, and they're also very easy to extend or add your own engine down the road if you so choose to. The storage object — one of the main drivers in HTML 5 was we needed a non-cookie space that we could store information. Cookies are sent on every request and that's bad, and it limits the amount of information you can store client-side if you're doing that. So that's one of the things that HTML 5 is trying to address, and that's one of the things that we address with the Storage Utility.
The two new properties introduced in HTML 5 are document.sessionStorage and document.localStorage. The difference between the two is that session storage will clear after the session ends, so when the browser or whatever else that's used closes, that's when the session storage data is removed in HTML 5, whereas local storage will last indefinitely. This just takes a look at the browsers that are supported, and that are currently supporting HTML 5. You have Internet Explorer 8, you have Firefox 3+, and Safari 4. The big takeaway here is that we needed a utility to simulate storage in browsers that do not support HTML 5 since the majority of browsers don't.
The second thing is that not all browsers are created equally, and there are issues with both Firefox 3 and IE. First to address the IE issues, by default, when IE is writing to and from the storage in HTML 5, it's going to be asynchronous. Our utility is synchronous, so under the hood we decided to force synchronicity by using transactions, which is a feature that IE introduced. Firefox 3.5 has both these issues fixed, but for some reason Firefox 3 didn't add the clear method, so we implemented that for you. The way they return values in Firefox 3 is slightly different, and we masked that for you as well. So we needed a tool to handle the cross browser variations of the HTML 5 Storage Utility.
A couple of examples of that are if you had an e-commerce site and you wanted the user to enter a zipcode to get the city state back, then you can store all those keys client-side so you didn't have to fetch that data from the server. If you were using a service like Flickr, or Delicious, there are a lot of tags — tens of thousands of tags — that can all be stored client-side for auto-complete. With email or Facebook, you have a lot of user names, your friends, just other members. If you wanted an auto-complete or a search feature, you could very quickly use client-side storage to be able to retrieve those user names.
Secondly, as Nicholas Zakas pointed out, if you were to have a situation where you're opening the same website in multiple browsers, and it's a private website like an email service or something and all your messages are stored client-side, you could potentially close one tab, log out, think that you've ended your session and your data is gone, but you've left this other tab open which has all your information in there. Down the road somebody could come back to that machine and all your private information is stored there. So again, this utility is really good for storing information client-side, but it should be non-private data.
For the rest of the presentation, all the Storage Utility classes are attached to YAHOO.util, and we will be omitting that for the rest of the presentation just to save space in the pseudo code. For example, YAHOO.util.Storage is just Storage, and YAHOO.util.StorageManager would just be StorageManager, etc.
This is just a quick flow chart as an overview of what's happening with Storage Utility. As a developer, you're going to call the StorageManager get function, you don't even need to pass anything to it, it'll just grab the first available Storage Engine, and then it'll return it to you. Then you can go ahead once it's ready to start reading and writing from that Storage Engine. When you do read and write from that Storage Engine, there's going to be storage events that are fired with one storage event; it's a chained event. We've created a storage event object to mirror HTML 5's storage event object, and that will be returned in all the callbacks.
A very simple example here is calling StorageManager.get. You're getting an instance of a Storage Engine. Down the road, you can set an item — so this is the key, and this is the value here that you want to set, and that's going to be stored into the Storage Engine. Later on down the road, you want to retrieve that item, so again, you just pass in the key.
Here's an example. I'm going to have this on my website tonight as well, if you want to take a closer look at it than what we're going to do here. This is an example that's going to store the city and state by the zipcode. If you took all of the zipcodes with the city and state information of the US, you're going to have about two megabytes of data. The example I'm going to use is only going to use 90,000+ codes — that's everything on the West Coast — and that's only about 200 k of data. The key is going to be the zipcode, and then the data is going to be the city, state, and then to retrieve it you'll just call get item with the zipcode. In the example, the end user is going to enter a zipcode, and when they do it will retrieve the city and state for them.
How are we going to do that? First of all, we're going to get the first available Storage Engine, so we're just going to call StorageManager.get. It doesn't matter what engine I get, just that I get one. Once you get a Storage Engine instance, you have to subscribe to this custom event, ready. Some of the Storage Engines need a second to kind of set themselves up, and that's what subscribing to Storage Engine ready ensures, that all the engines have been set up. In the example, once the callback happens there's one button on the page and we attach a listener to that button in the example.
Then when the user enters his zipcode, what's happening in the event callback? Well, first of all it's going to check the storage.length. This is the instantiated Storage Engine, and it's the length of it. In all the Storage Engines there's a length property which indicates the number of keys that have been stored there. So in this example, we assume if there is a length that all the keys have already been added to the Storage Engine, so it'll just return the value and show it to the user. However, in this example, if you don't have any length to your Storage Engine, we assume that the keys aren't there, and we make a request via Ajax to go to the backend, fetch the data, and push it into the Storage Engine. So this is a good example of where you can use a Storage Engine to store Ajax data so that you don't have to keep requesting it from the server.
When you call StorageManager.get, there are three arguments you can pass in. The first argument is engine type, and this is your preferred engine type — it's going to be the first one that's tried if you pass in this optional argument. There are the three different types of engines you can use. You want to pass in the engine name, which is a constant attached to all of the Storage Engines. The second is a location. There's the local or the session storage, and these are constants on storage manager. You can specify whether you want it to be stored temporarily or indefinitely — by default it's going to be stored in the session.
The last argument is a configuration object. There are three things you can define on the configuration object, and the first is the order. The get method is going to try all of the available engines until it finds one that works, and there's a default order for that — it's actually the order that they were registered with storage manager, so it's really the order that they appear in the code. If you don't like that order, you can pass in an array as the order property and that will be the order in which they're attempted first before using the default order. If you only want to use this first engine parameter, you can set force to true. Setting force to true means that it's only going to use that preferred engine type that you passed in, and anything else is going to fail. Lastly, the configuration can have an engine property, but currently this is only being used by the Storage Engine SWF. That's because the SWF has a lot of properties that can be customized about it, such as attributes, the ID you want to set up the SWF in, whether you want it to be compressed or not, all that can be passed through this engine property. Right now, only Storage Engine SWF is using it.
I mentioned that there's a default order in which the engines are all attempted, and this is the way they appear in the code. But the three that we have, we've kind of thought through why we want them in this order. The first one that's going to be tried if you don't pass anything is HTML 5. It's the most future looking engine — most browsers will begin to start supporting that, and down the road I think all browsers will eventually support these storage objects from HTML 5. So we want to try this first, we want to store it here first if possible. The second one we're going to attempt is Gears. The reason that Gears is going to be attempted second is that it's a little bit more robust than SWF. It's a database that we're storing information into, but it's not as widely available as SWF. The last one we're going to try is the SWF engine, and this is because it's the most widely available technology, so that makes it a great fallback.
Now we're going to take a look at each of the engines and exactly what they're doing. The first engine attempted is the HTML 5 engine, and it's going to use the built-in HTML 5 support. This gives you five megabytes of storage in Firefox and Safari, and ten megabytes of storage in IE. Since most applications should be built for the lowest common denominator, we're going to recommend that you only store up to five megabytes of data if you're using the HTML 5 storage. Another caveat: if you're using session storage and the browser crashes, only Firefox 3 is going to recover that data. All other browsers, when they go into their recovery mode and come back, are going to lose the session data that was stored. Lastly, the HTML 5 Storage Engine handles all the cross browser issues of HTML 5, the different storage properties.
Now the SWF engine. This is the last one we try, and the reason is that according to Adobe, 99 per cent of browsers have Flash not only installed but enabled. So it makes a great fallback, as most people will have this. With SWF, you can store up to 100 kilobytes of data before the user has to approve it. Once you exceed that limit, we'll show the user a dialog inside the SWF object itself where they can approve additional storage, and I believe the limit there is one megabyte of data.
Audience member 1: It's actually unlimited now.
Matt: It's unlimited now? OK.
Loading the SWF takes several seconds. It has to first fetch the SWF from the server, and then it has to take a look at the white listed URLs that can actually use that SWF. All this takes a bit of time, and it's why we have the ready event with all the engines — to make sure that the engines are all ready to be used, we have used the ready event. It's called after the constructors have fully initialized. SWF was the main reason we had to do that, because it takes a little while for it to set up. We've just emulated that across all the other engines, so they have the exact same API. Again, cookies must be used for session storage.
With that, you now have everything you need to be able to use the Storage Utility. I'm going to go into more detail now about how each one of these engines work, but that's all you need. You call StorageManager.get, you get an instance, and then you can read and write from it. That's the basics of it.
We're going to take a look at the register function of storage manager — what exactly is it doing? Register is called at the very end of the code that defines an engine. What you pass into register is the constructor, and the validation inside register is first going to figure out if the constructor that you pass in exists. It's then going to see if the constructor has an isAvailable function. The IsAvailable function is used by the get method to see if the technology required to use this particular engine is available, and it'll be called each time before get is able to return instantiated object.
Engine name is the unique name you're going to assign to this engine, and it's used for a map to very quickly fetch when calling the storage manager get. We use this engine name map so that we don't have to iterate through all the Storage Engines all the time. The map that's created is just the registered engine map, and that's a map of the engine name to the constructor method. Then we also create an array of all the engines that were successfully registered here, and this function will return true or false depending on whether it was validated and added to the registered set list.
Now, once you have all your engines registered you can start using StorageManager.get. StorageManager.get is going to attempt to instantiate the engine that you provided first, and that's the engine type here. That's the first argument you would pass in. That's optional, you don't have to provide it, but if you do, we'll first try to use that engine.
If that engine is unavailable, the second thing we're going to do is see if you set that force configuration property. If that's true and your engine wasn't initialized, then it's going to fail, it's going to throw an exception. But if that's false, then we go ahead and we're going to try first to use the configured order. We're going to iterate through that order and see what engines are available to use, and we'll use that first. If you didn't define the order then we just use that registered engine set that you saw on the registration function. It'll just iterate through that, see which engines are available, and return the first one. When an engine is successfully instantiated, we return it; otherwise we're going to throw an exception. Since most people are going to have one of these technologies, or the fallback technology — 99 per cent of browsers have Flash installed — there's a 99 per cent chance that this is going to work, or probably a little bit higher than that.
The next object is the storage event object. This is mirrored after the HTML 5 storage event object, so it has all the same values that that object has — the key, the old value, new value, URL window, and storage area. Those first few probably explain themselves. The storage area is going to be a pointer back to the Storage Engine that was used to generate this event. The storage event will be passed into all of the subscribers, to the change event of the Storage Engine. So once you have an instantiated engine, you can just use the subscribe function, pass in a callback, and the callback function will be passed to the storage event. Lastly, the only thing we've added to storage event that wasn't there originally in the spec was the type. This lets you know if the change event was adding data to the engine, or if you were updating existing data in the engine, or if you were removing data from the engine. This is so if you, as a developer, need to respond to certain types of events, you have that meta-data available to you.
Audience member 2: Does this work across Windows, [inaudible]?
Matt: We haven't tested it in that environment yet. This is going to be a pointer to the current window that it was executed in, so there could be conflicts if you're passing it between different windows right now.
Then to take a look at the interface, the storage interface. It uses the event provider model, and what this means is that you get the subscribe function, you get the unsubscribe function, you get all the goodness of the event model of YUI 2, and it's going to be applied directly to all the instantiated Storage Engines for you. These are the public functions that are spec-ed out in HTML 5, and all of them exist with all the Storage Engines. Storage is what handles it.
We've added one new function, and that's the getName function. That's going to return that constant engine_name that was defined when the constructor was created. This just allows you to be able to tell what Storage Engine you have. Lastly, the storage has a bunch of protected functions as well. These are actually the functions that are going to be implemented by each different engine, and the reason we do that is that the public functions can handle some of the generic, when you set up storage events, when you evaluate if the arguments passed in are valid and whatnot. Then since you know you have valid keys, you have valid content, you can then go ahead and pass that in to the protective function which is going to do the reading and writing from whatever technology we're using with that engine.
There's one other thing we did that's not in the spec, and that's the createValue and getValue. We decided down the road, as we were developing this project, that we wanted you to be able to pass in non-string values. So we wanted you to be able to pass in numbers, we wanted you to be able to pass in Boolean values, and when you called in the get method later, we wanted those values to return in the same primitive type that they were passed in. So before delegating to the protected implementation functions, the setItem is going to call createValue, which is going to add some meta information about the type of information that was passed to it. And before the getItem returns, it's going to call getValue on the value that the engine returned, and convert it to whatever type was originally provided.
This is kind of complicated, so just to reiterate: if the developer called setItem, what's going to happen is it's going to parse the parameters, it's going to handle any errors, it's going to set up the storage event that's going to go to the callback. If all that works out correctly, it's then going to call this protected method, _setItem, and we've done this for all the methods. The _setItem has to be implemented by the engine, and that's how the model is — there's the public method and then there's the protected method. All the protected methods need to be implemented by each different engine, and that way we can have a common API exposed and we can delegate to each engine for the actual reading and writing from their technology.
There are a couple of events that are important. The custom event ready is the name of the event that we'll be using to tell you that the Storage Engine is ready. After you've instantiated Storage Engine, subscribe to this with a callback, and once that callback fires you can go ahead and use that engine. The change event here, this is what happens anytime you read or write from the engine. This is when you're calling setItem, or if you're calling removeItem. The setItem can either add or update an existing key, and the removeItem will just remove a key and its value from your engine.
Looking at each of the Storage Engines, what do they have that's unique? Well, they all have the protective methods that have to be implemented. They're all going to have an underlying technology. For the terminology's sake we've called this technology the driver, so with the Gears engine the driver is the Gears, and with the SWF engine the driver is the SWF. All of them are going to call the ready event when their constructors finish. Each of them have to define the engine_name constant and the isAvailable function. All of them are going to have a length attribute, and this is going to be the number of keys that are currently stored in this engine. The last thing they need to do is call StorageManager.register so that it's available to the get function of storage manager.
Looking under the hood of HTML 5, this is the unique name we've assigned to HTML 5 for the engine. The isAvailable function is used to see if we can use HTML 5. All it's going to do is evaluate if document.sessionStorage exists, and we assume that if document.sessionStorage exists that you can use the HTML 5 engine. With HTML 5, each instantiated object stores the driver on the object, and this is because there are two different singletons: the localStorage and the sessionStorage object, so each driver needs a pointer to that object. Any time there's a set or get or any of the methods, they actually just defer to the HTML 5 method, with the caveat of cross browser issues. Firefox 3 doesn't have a clear method, so we go ahead and implement that. We make IE 8 behave synchronously so that it behaves like all the other browsers.
The Gears engine we have just named Gears. Its isAvailable function is a little bit more complicated. The first thing we need to do is see if the Gears name space exists, and this happens if you include the Gears init file, and that file finds the Gears API. The second thing we need to do is write to the MySQL Lite database. The reason for that is that it will force the user, if they haven't already, to see that dialog to either approve it, or deny your application's access to the Gears database. If they choose to deny access, it's going to throw an exception inside this method, and we can return faults. Otherwise we return true and we assume that it is available. A few other things about Gears: it's going to create a database in MySQL Lite called yui.database, and it's going to create a table called YUIStorageEngine. If you guys are using Gears for any of your projects, don't use these names. Lastly, with any database you need to close your connections to it. What we're doing with the Gears engine is that when the window unloads we go ahead and close any of the database connections that we've opened with Gears to release the resources.
Now, how does the Gear's constructor work? First of all, it's going to initialize a driver if it doesn't exist. What that means is it creates the database, or accesses the database, and sets up the table, or just makes it available. The second thing here is really important, and another reason why you don't want to store secure data — it's because there's no really effective way to know when the session has ended. You can tell if the browser window is closing, but you can't really tell if they have another window open. With the data stored in Gears and SWF in the session, we don't have a really good way to know that the session has ended until they come back again later and that session cookie doesn't exist any more. Even though you've stored information in the session, it's actually going to remain stored on the machine until that user comes back the next time. When they come back, we evaluate the cookie, we see that it doesn't exist, and before the engine has returned to you, we're going to clear out all those previous session values. But those session values, like the HTML 5, they're not truly sessions, they're actually stored until the next time this same user comes back and this engine is used.
The last technology is the StorageEngineSWF. What this is showing is just that there's this shared object, and all SWFs have the availability to read and write from a shared object. You just have to namespace it. We've called it SWF, and the isAvailable function for SWF is pretty simple. First of all, we make sure that Yahoo! widget SWF — this is the SWF utility — exists. The reason we do this is that we're going to use this to instantiate the SWF object and be able to get the SWF back to read and write from. The other thing is that we need to make sure that at least Flash version 6 is defined — that's when Flash introduced this shared object which we're going to be using to read and write data. It uses the shared object technology of Flash.
As I was saying, the driver here is going to be an instantiated SWF widget. There are a lot of parameters that can be customized here: do you want to compress? What ID do you want the container to be? Do you want additional attributes applied to the SWF object? All this stuff can actually be passed using the configure.engine that was passed into the get method. If you expect that you're going to be using a SWF, or when SWF is used you want particular properties to be used, you can pass those through the storage manager getFunction. The last thing is, we compress all the data by default, and the reason we do that is it just gives you more space to be able to store into the SWF object than the allocated 100 kilobytes of data, or I guess now it's unlimited data.
I was going to show you an example of how to write a Storage Engine using cookies. They're a pretty simple technology to illustrate, but I'm running out of time so I'm going to skip it. But this is available on my website, MattSnider.com.
Here's a good example. You see this engine is implementing the protectClear method, the getItem method, the key method, the removeItem method. Anytime you're creating your own engine, you just need to implement the six methods that need to be implemented.
Where is Storage Utility going? What's the future of it? One of the things that I think is going to happen down the road is we'll probably end up combining the SWF Storage.js that was written by Alaric Cole and my Storage Engine SWF. There are a lot of similarities in the code between the two, and a lot of redundancy that doesn't need to be there. One of my grievances with what I wrote is the way that the ready event works is a little hand wavy right now, and kind of hack-ish. I'd like to improve how that works down the road, and make it work more the way I expect it to work. So, you'll expect some improvements with the ready event. In the original implementation, any class inherited from StorageEngineKeyed didn't really make a good use of that inheritance. I've since fixed that and checked it into GitHub, but it's not in the release candidate version of Storage Utility. But there's been some improvements made in the way that we inherit from StorageEngineKeyed.
One possible improvement I think we'll do is either have a new set of engines, or augment the existing engines to be able to support asynchronous callbacks. One, this alleviates even the need to have the ready event, and two, it allows you to use technologies that are, by their nature, asynchronous. A good example of that is BrowserPlus. In Yahoo!'s BrowserPlus technology every call is asynchronous, and because the utility is all synchronous, we can't use it. Secondly, if you wanted to use Storage Engine as a front for all your Ajax requests where you're just fetching and retrieving data from the backend, you could potentially write a driver based on Ajax and use all the goodness of this engine, but you can't, because it's asynchronous.
Lastly, we really hope to have your contributions. With the Gallery and the availability of Storage Utility, in the future we hope that all of you will find ways to improve what we've already done. There's the YahooLibrary.com contribute URL, and you can go there and figure out how to contribute. It's actually really easy.
That's my presentation. Do you guys have any questions?
Audience member 3: My question is: you mentioned some of the different engines have different storage limitations, but is there a way of determining how much space we have available?
Matt: The question was, is there a way to determine with a given engine how much space is available. The short answer is: not really. With some of them, like the HTML 5, you can guess. You can look at how much data you've stored and how much data should be available, and determine that. But with an engine like the SWF we don't really know, until you write to it, that it's out of space. You could do the same with Gears, so we could expose APIs with the Gears and probably HTML 5 that gave you an estimate of how much space was available, but we can't do that with the SWF right now. Because of that, we've chosen not to enable that. They just throw an exception when you can't write to it any more.
Audience member 4: The SWF, in particular, it's probably a security thing.
Matt: Is that why they do it?
Audience member 4: To not let you know how much is available, so they probably won't change that.
Matt: OK. Are there any other questions?
Audience member 5: What are you guys using the Storage Utility at Mint for?
Matt: The question was, what are we using the Storage Utility at Mint for, and the answer is currently nothing.
Matt: Jess recently wrote this back in August. Mint was recently acquired as well, and we've been moving so fast that I haven't had time to implement it. What I think we're going to do with it is, there's one place where we do this zipcode lookup, and I think we'll end up putting that in the Storage Utility. The other thing is, every user has a set of categories — there are shared categories among all users, and then there are unique categories for users. I would like to put that in the Storage Utility to improve our auto-complete, and improve some of the other features. But it's all down the road and hasn't been implemented yet.