Date-Based Rewrites for Static CDN

Content Delivery Networks (CDN) have become very popular in the past several years. They offer an easy way to save bandwidth and bring content physically closer to end-users. CDNs offer a variety of services, though pricing and features are usually tailored to larger content providers. As a smaller provider myself with only an ADSL line to host my personal websites — and as a SysAdmin who prefers to host his own content — I decided to mirror my static content, and redirect traffic as I needed. The following describes a solution to keeping all of my content local, yet mirroring the static content for faster delivery.

There are downsides to this approach — if you mirror only your static content, like images, etc., users will still have to hit your web server for dynamic, often database dependent, content. You can move your database to a cloud-based service, but this means loosing some control over your content. The following solution will work best for websites with a lot of static content, like videos, images, etc. If your website has little to no static content, aside from maybe a few CSS and javascript files, the savings in bandwidth might not be worth the effort.

The first thing you’ll need is some disk space somewhere, preferably with an HTTP server configured with a virtual hostname you’ve chosen.

Here are the general steps to follow for setting up a CDN with dynamic rewrites:

1) Setup a web server (with enough disk space) from a low-cost provider.

2) Create a new hostname in your domain like static.hostname.com.

3) Mirror your static content using rsync or another mirroring tool. Here’s an rsync-websites.sh script I wrote that you can use as an example.

#!/bin/bash
#
# /usr/local/bin/rsync-websites.sh
#
# Mirror static website content to another server using rsync and SSH.
#
# Copyright 2012 - Jean-Sebastien Morisset - https://surniaulula.com/
# 
# This script is free software; you can redistribute it and/or modify it under
# the terms of the GNU General Public License as published by the Free Software
# Foundation; either version 3 of the License, or (at your option) any later
# version.
#
# This script is distributed in the hope that it will be useful, but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
# FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details
# at http://www.gnu.org/licenses/.

www_dir="/export/www"
rsync_opts="--recursive --times --omit-dir-times --links --delete"
static_srv="static.YOUR-DOMAIN-NAME.com"

function RsyncStatic () {
	[ -n "$visual" ] && echo -e "\nrsync $2...\n"
	#
	# Exclude all cache folders, except those that contain images, like
	# gallery/cache/ and blogroll-links-favicons/cache/.
	#
	# Exclude all PHP scripts, except for those under bwp-minify/min/,
	# which may provide minified files on the remote (PHP-enabled) server
	# by also using the CDN Linker plugin. Protect files under
	# bwp-minify/cache/ from being deleted on the remote server since they
	# are needed by the PHP scripts under bwp-minify/min/.
	#
	rsync --rsh="ssh -x" --perms --delete-excluded \
		--filter='P wp-content/plugins/bwp-minify/cache/**' \
		--filter='+ wp-content/plugins/bwp-minify/min/**' \
		--filter='+ wp-content/plugins/blogroll-links-favicons/cache/**' \
		--filter='+ wp-content/gallery/cache/**' \
		--filter='- **/cache/**' \
		--filter='- *-modified' \
		--filter='- *-new' \
		--filter='- *-orig' \
		--filter='- *.php' \
		--filter='- *.po' \
		--filter='- *.mo' \
		--filter='- *.pot' \
		--filter='- *.sql' \
		--filter='- *.swp' \
		--filter='- screenshot-*.jpg' \
		--filter='- screenshot-*.png' \
		$rsync_opts "$1" "$2"
}

# read command line opts
while :
do
	for arg in "$@"
	do
		case $arg in
			-h|--help)
				echo "purpose: rsync content to remote websites."
				echo " syntax: $0 [--help|--visual|--test]"
				exit 0
				;;
			-t|--test)
				rsync_opts="$rsync_opts --dry-run"
				shift 1
				;;
			-v|--visual)
				visual="1"
				rsync_opts="$rsync_opts --verbose"
				shift 1
				;;
			-*)
				echo "error: unrecognized command line argument."
				exit 1
				;;
                        *)	args[$(( i++ ))]="$1"
				shift 1
				;;
		esac
		continue 2
	done
	break
done

# reset $1, $2, etc. with left-over paramters
set -- "${args[@]}"

# sync static server
for sitedir in $www_dir/*
do
	for subdir in wordpress
	do
		if [ -d "$sitedir/$subdir/" ]
		then
			sitename="${sitedir##*/}"
			RsyncStatic "$sitedir/$subdir/" \
				"$static_srv:www/$sitename/$subdir/"
		fi
	done
done

Download the rsync-websites.sh script.

4) Create a script to check file dates and return the modification time to Apache. I use the following and keep it under /usr/local/bin/msecs.pl.

#!/usr/bin/perl -Tw
#
# /usr/local/bin/msecs.pl
# Print the number of seconds since a file was last modified.
# by Jean-Sebastien Morisset (https://surniaulula.com/)

use strict;

$| = 1;

while (<>) {
	chomp;
	if (! /^\//) { print "0\n"; next; }	# force fully qualified paths
	my $systime = time();
	my @fstat = stat($_);
	print $systime - $fstat[9], "\n";
}

exit 0;

5) And lastly, add the following to your website’s Apache configuration, either as an include file or an .htaccess file. You may want to review the REQUEST_FILENAME list to add or remove filetypes.


# rewrite-static.conf
# Redirect static content older than 1800 seconds to CDN.
# by Jean-Sebastien Morisset (https://surniaulula.com/)

# sets %{SERVER_NAME} to config value instead of %{HTTP_HOST} value
UseCanonicalName On

RewriteEngine On

# Returns the number of seconds since the file was last modified.
RewriteMap msecs prg:/usr/local/bin/msecs.pl

# Server-up the files ourselves if we're using https.
RewriteCond %{HTTPS} =off

# Don't redirect if the static hostname loops back.
RewriteCond %{HTTP_HOST} !^static\.

# Include only those static file extensions that we want to off-load.
RewriteCond %{REQUEST_FILENAME} ^/.*\.(html|xml|txt|zip|gz|tgz|swf|mov|wmv|wav|mp3|pdf|svg|woff|jpg|jpeg|png|gif|ico|css|js)$

# Make sure the requested file exists (disk access).
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} -f

# Pass msecs the complete filepath, and continue if last modified > 1800 seconds (disk access).
RewriteCond ${msecs:%{DOCUMENT_ROOT}%{REQUEST_FILENAME}} >1800

# Add "static" to our hostname and redirect.
RewriteRule ^/(.*)$ http://static.%{SERVER_NAME}/$1 [redirect=permanent,last]

There are a few different ways to do RewriteMaps. I chose the easiest and most flexible for me — a dynamic script that will always give instant results — if you modify an image, your visitors will get the new image immediately. The downside of this approach is that Apache runs only a single instance of that script (per virtual host), and all requests must pass through this script. You could create several rewrite maps, paired with different request filenames to spread the load onto multiple scripts, but a better alternative at that point might be to use a DBM Hash File or SQL Query rewrite map instead.

Update 2012-11-22 : When I off-loaded the static content to another web server, I also configured the Apache Module mod_expires (both on my own web server, and on the the remote web server as well) to lessen the number of requests. This is another two-edged sword — by allowing content to be cached for days, weeks, or even months, any change to that content might not be available to the client until it has expired from the cache. Here is the Apache Httpd configuration that I use:


# Turn on expires
ExpiresActive On

# Set default to 0
ExpiresDefault A0

# Expire from cache after 1 year
<filesMatch "\.(doc|flv|ico|pdf|avi|mov|ppt|ttf|mp3|wmv|wav)$">
    ExpiresDefault A29030400
    Header append Cache-Control "public"
</filesMatch>

# Expire from cache after 1 week
<filesMatch "\.(gif|gz|ico|jpg|jpeg|js|png|swf|tar|tgz|zip)$">
    ExpiresDefault A604800
    Header append Cache-Control "public"
</filesMatch>

# Expire from cache after 2 hours
<filesMatch "\.(css|txt|html|xml)$">
    ExpiresDefault A7200
    Header append Cache-Control "proxy-revalidate"
</filesMatch>

# No caching for dynamic files
<filesMatch "\.(cgi|php|pl|sh|shtml)$">
    ExpiresActive Off
    Header set Cache-Control "private, no-cache, no-store, proxy-revalidate, no-transform"
    Header set Pragma "no-cache"
</filesMatch>

Update 2012-12-01 : I’ve modified the rewrite rules to use multiple CDN hostnames to improve performance by paralleling content download. Since most browsers only download two components at the same time from the same hostname, using multiple hostnames is a quick and efficient way to improve download speeds (see linked article for additional details).


# rewrite-static-rnd.conf
# Redirect static content older than 1800 seconds to multiple CDNs.
# by Jean-Sebastien Morisset (https://surniaulula.com/)

# sets %{SERVER_NAME} to config value instead of %{HTTP_HOST} value
UseCanonicalName On

RewriteEngine On

# Returns the number of seconds since the file was last modified.
RewriteMap msecs prg:/usr/local/bin/msecs.pl

# List of CDN hostnames to use for the redirect. Random text rewritemaps use
# the format "lookup-key return-value1|return-value2|etc". The lookup-key
# should be the webserver SERVER_NAME name, prefixed with "static.". For
# example:
#
# static.MY-DOMAIN.com	cdn1.static.MY-DOMAIN.com|cdn2.static.MY-DOMAIN.com|cdn3.static.MY-DOMAIN.com
#
RewriteMap hosts rnd:/etc/httpd/maps/hosts.txt

# Server-up the files ourselves if we're using https.
RewriteCond %{HTTPS} =off

# Don't redirect if the static hostname(s) loops back.
RewriteCond %{HTTP_HOST} !^(cdn[0-9]\.)?static\.

# Include only those static file extensions that we want to off-load .
RewriteCond %{REQUEST_FILENAME} ^/.*\.(html|xml|txt|zip|gz|tgz|swf|mov|wmv|wav|mp3|pdf|svg|woff|jpg|jpeg|png|gif|ico|css|js)$

# Make sure the requested file exists (disk access).
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} -f

# Pass msecs the complete filepath, and continue if last modified > 1800 seconds (disk access).
RewriteCond ${msecs:%{DOCUMENT_ROOT}%{REQUEST_FILENAME}} >1800

# Add "static" to our hostname and lookup the CDN hostname to redirect.
RewriteRule ^/(.*)$ http://${hosts:static.%{SERVER_NAME}}/$1 [redirect=permanent,last]

Find this content useful? Share it with your friends!