Pushing Beyond Gzipping

In Tony Gentilcore's chapter, "Going Beyond Gzipping," in the Even Faster Web Sites book, and at his presentation at Velocity 2009, Gentilcore describes a weird behavior where roughly 15% of visitors are not receiving compressed responses even though these user agents support compression. Although all modern browsers (circa 1998) support Gzip compression, surely 15% of users are not using browsers older than 10 years. The culprit are proxies and security software that mangle or strip the Accept-Encoding HTTP header:

Accept-EncodXng: gzip, deflate
X-cept-Encoding: gzip, deflate
XXXXXXXXXXXXXXX: XXXXXXXXXXXXX
---------------: -------------
~~~~~~~~~~~~~~~: ~~~~~~~~~~~~~

Workarounds

Gentilcore emphasizes the importance of serving compressed content for these users and comes up with some approaches to workaround this issue:

  1. Appeal to the vendor of software that mangles/strips the Accept-Encoding header — but this will take some time until all users upgrade their versions.
  2. Design to minimize uncompressed size, which involves some techniques such as to use of event delegation, use relative URLs, minify HTML/JS/CSS, and avoid inline styling. The real savings for popular web pages is 11.6% on average, which is way less than 72.1% when Gzip is enabled.
  3. Educate users by informing they are experiencing slowness because compression is not enabled, pointing them to the list of software that causes this issue and how to disable/upgrade them. This might help although certainly users behind company proxies can't do anything but complain to the administrator.
  4. Direct detection of Gzip support is a more intrusive client-side approach, where a test is performed in a hidden iframe that fetches compressed JavaScript content regardless of the request headers and sets a cookie indicating the client supports compression, thus next requests should check for this cookie.

Recent Research

At Velocity Conference 2010, Andy Martone presented Forcing Gzip Compression, a more detailed research with a client-side compression detection for Google Web Search where he shares the success of this approach improving the user experience by serving compressed content even when the Accept-Encoding is missing.

He also states that client-side detection is helpful but might bring false positive results, which is bad for the user experience. However after the browser is restarted, the session cookie is reset.

A Server-Side Approach

Since the mangled patterns of proxies and security software are well-know (see the complete table on the Even Faster Web Sites book, chapter 9, page 124), there is room for detecting these patterns on the server prior to serving an uncompressed response and force-gzipping in these cases.

Apache has modules that allows customization of request/response headers (mod_headers) and set environment variables based on characteristics of the request (mod_setenvif).

Forcing the request header to support compression is straightforward:

RequestHeader set Accept-Encoding "gzip,deflate"

But it's dangerous to consider that all users have compression enabled. In this example set argument is replacing any previous Accept-Encoding header.

In order to detect the mangled patterns mod_setenvif can be used to perform a regular expression match and set an environment variable indicating the mangled Accept-Encoding header is present.

SetEnvIf ^Accept-EncodXng$ ^gzip, deflate$ HAVE_Accept-Encoding

In this example, the “Accept-EncodXng: gzip, deflate” mangled pattern if present will set an environment variable named HAVE_Accept-Encoding which can be later used by mod_headers to set the Accept-Encoding to support compression properly:

RequestHeader set Accept-Encoding "gzip,deflate" env=HAVE_Accept-Encoding

A more generic solution for mangled pattern matching and enabling compression follows:

SetEnvIfNoCase ^(Accept-EncodXng|X-cept-Encoding|X{15}|~{15}|-{15})$ ^((gzip|deflate)s*,?s*(gzip|deflate)?|X{4,13}|~{4,13}|-{4,13})$ HAVE_Accept-Encoding

RequestHeader append Accept-Encoding "gzip,deflate" env=HAVE_Accept-Encoding

The first line set the environment variable HAVE_Accept-Encoding if any of the known mangled patterns is present in request headers. Note that SetEnvIfNoCase directive is used to ignore case on regular expressions. Breaking down these regular expressions a list of matching would be:

Accept-EncodXng   ---> Accept-EncodXng
X-cept-Encoding  ---> X-cept-Encoding
X{15}              ---> XXXXXXXXXXXXXXX
~{15}              ---> ~~~~~~~~~~~~~~~
-{15}             ---> ---------------

(gzip|deflate)s*,?s*(gzip|deflate)? ---> gzip
                                      ---> deflate
                                      ---> gzip,deflate
                                      ---> deflate,gzip
X{4,13}                               ---> XXXX to XXXXXXXXXXXXX
~{4,13}                               ---> ~~~~ to ~~~~~~~~~~~~~
-{4,13}                              ---> ---- to -------------

Since there's no way to predict how proxies and security software mangle the headers, {4,13} means 4 to 13 occurrences of X, ~ or – are replacing the possible values for Accept-Encoding, like “gzip” which has length = 4 and “gzip, deflate” which has length = 13.

The second line is just appending “gzip, deflate” to Accept-Encoding header, in this case append is used instead of set to preserve any previous values if Accept-Encoding is also present.

What About Stripped Accept-Encoding Headers

There's no straightforward way to identify, on the server-side, if the client supports compression without the Accept-Encoding header. In this case, the client-side approach described above would take care of it.

If one is brave enough and willing to take risks of false positive, dig deeper and use BrowserMatch[NoCase] directive to identify browsers known to support compression:

BrowserMatchNoCase (MSIE|Firefox|Chrome|Safari|Opera) HAVE_Accept-Encoding

RequestHeader append Accept-Encoding "gzip,deflate" env=HAVE_Accept-Encoding

For simplicity's sake, the list of user agents above is just a sample and shouldn't be used in production. Enhance this list/regular expression to match all modern browsers that support compression: IE 6+, Firefox 1.5+, Safari 2+, Opera 7+, and Chrome. For a more accurate list of user agents that support compression, consult the Browserscope (network tab, compression supported column). At the time of this writing, only a few mobile agents like Blackberry and Samsung SGH, Playstation, Links and others have no compression support.

Although modern browsers do support compression, one might have disabled it voluntarily. Use at your own risk!

Conclusion

All the approaches above are valid and arguably effective; both server- and client-side detection approaches carry a minimum of false positive risk. Before you go for it and apply any or all of these workarounds, measure first the percentage of users that are missing compression and make sure they fall into this stripped/mangled Accept-Encoding header issue by checking the server logs. Depending on site's audience and traffic, it might not be necessary to worry about it.

This article first appeared as the December 6th post in the Performance advent calendar.