Date-Based Rewrites for Static CDN

Content Delivery Networks (CDN) have become very popular in the past several years. They offer an easy way to save bandwidth and bring content physically closer to end-users. CDNs offer a variety of services, though pricing and features are usually tailored to larger content providers. As a smaller provider myself with only an ADSL line to host my personal websites — and as a SysAdmin who prefers to host his own content — I decided to mirror my static content, and redirect traffic as I needed. The following describes a solution to keeping all of my content local, yet mirroring the static content for faster delivery.

There are downsides to this approach — if you mirror only your static content, like images, etc., users will still have to hit your web server for dynamic, often database dependent, content. You can move your database to a cloud-based service, but this means loosing some control over your content. The following solution will work best for websites with a lot of static content, like videos, images, etc. If your website has little to no static content, aside from maybe a few CSS and javascript files, the savings in bandwidth might not be worth the effort.

The first thing you’ll need is some disk space somewhere, preferably with an HTTP server configured with a virtual hostname you’ve chosen. I use DreamHost, which offers virtual machines with unlimited disk-space and bandwidth, for about $8.95/month. And if you use the “quickcdn” promo code, you’ll get an additional $20.00 off the 1 or 2 year package cost. ;-)

Here are the general steps to follow for setting up a CDN with dynamic rewrites:

1) Setup a web server (with enough disk space) from a low-cost provider like DreamHost.

2) Create a new hostname in your domain like static.hostname.com. If you don’t already have a domain name, DreamHost can host and register your domain as well.

3) Mirror your static content using rsync or another mirroring tool. Here’s an rsync-websites.sh script I wrote that you can use as an example.

Download the rsync-websites.sh script.

4) Create a script to check file dates and return the modification time to Apache. I use the following and keep it under /usr/local/bin/msecs.pl.

5) And lastly, add the following to your website’s Apache configuration, either as an include file or an .htaccess file. You may want to review the REQUEST_FILENAME list to add or remove filetypes.

There are a few different ways to do RewriteMaps. I chose the easiest and most flexible for me — a dynamic script that will always give instant results — if you modify an image, your visitors will get the new image immediately. The downside of this approach is that Apache runs only a single instance of that script (per virtual host), and all requests must pass through this script. You could create several rewrite maps, paired with different request filenames to spread the load onto multiple scripts, but a better alternative at that point might be to use a DBM Hash File or SQL Query rewrite map instead.

Update 2012-11-22 : When I off-loaded the static content to another web server, I also configured the Apache Module mod_expires (both on my own web server, and on the the remote web server as well) to lessen the number of requests. This is another two-edged sword — by allowing content to be cached for days, weeks, or even months, any change to that content might not be available to the client until it has expired from the cache. Here is the Apache Httpd configuration that I use:

Update 2012-12-01 : I’ve modified the rewrite rules to use multiple CDN hostnames to improve performance by paralleling content download. Since most browsers only download two components at the same time from the same hostname, using multiple hostnames is a quick and efficient way to improve download speeds (see linked article for additional details).