I recently added some disk caching for MySQL queries, WordPress objects, PHP opcode, and PHP web pages on my server. There are several different caching techniques and applications available, and memcached seems like one of the more popular ones. Right or wrong, it appears to be the default go-to for many developers these days.
Since I’m a SysAdmin by profession (with maybe a penchant for scripting and integration), I tend to have a more “systems” oriented approach — which led me to first consider, and then choose disk caching over memcached. In this post, I’ll outline the reasons I chose disk caching, and why in most circumstances it might be superior to memcached.
Disk Cache is Faster
There’s very little debate over this — local disk caching is faster than using a local memcached, and much faster than using a remote memcached server. Although this is common knowledge, benchmarks showing the difference between the two are not common. I did find an article by Peter Zaitsev from 2006 in which he uses a PHP script to benchmark a variety of caching techniques.
Cache Type Cache Gets/sec Array Cache 365000 APC Cache 98000 File Cache 27000 Memcached Cache (TCP/IP) 12200 MySQL Query Cache (TCP/IP) 9900 MySQL Query Cache (Unix Socket) 13500 Selecting from table (TCP/IP) 5100 Selecting from table (Unix Socket) 7400 Benchmarks from Cache Performance Comparison by Peter Zaitsev.
Peter Zaitsev’s tests use a best case scenario — the server had enough memory for disk caching, and memcached was installed on the localhost. The results show that disk caching offers more than twice the performance of memcached.
Disk Cache is Variable
Memcached is assigned a specific amount of memory. Disk cache on the other hand is managed by the OS, and uses as much free memory as is available on the server. This is both good and bad, depending on how much memory is available, and how well behaved the applications on that server are. For example, you might give memcached 2 GB of memory, but the OS might have 8 GB of free memory. The OS will use that 8 GB to cache disk access (read and write), not just of the files in your cache directory, but of all frequently accessed files. As Peter Zaitsev’s results show, disk caching offers more than twice the performance of memcached, but this is only as long as there is enough free memory to cache the files, and there are no disk I/O bottlenecks when creating them. If misbehaving applications (or users) start to use all the available memory, and/or disk I/O, disk caching performance will suffer while memcached performance will remain unchanged. Using disk cache also makes the server more resilient to “out of memory” conditions, since unlike memcached, the memory used by disk caching is not reserved. Performance might suffer, but the server might stay up longer with the extra memory memcached might have been assigned.
Disk Cache is Persistent
If you reboot the server, or stop and start applications, disk-based caching will remain unchanged (though it must be read back into memory). The memcached cache is persistent only so long as memcached is not restarted.
Disk Cache is Colder
If you reboot the server, the OS will need time to re-build the disk cache in memory — it needs to “warm-up”. The OS must read or write a file for it to be cached in memory, and if the file hasn’t been read in a while, it may no longer be available in the OS disk cache (depending on the amount of available memory, and other disk activity on the system). Saving information to memcached on the other-hand, becomes immediately available. The applications using memcached also have direct control over which information stays longer in the cache, and which information expires sooner.
Update 2012-12-06 : An example of a script that reports the size of WordPress cache folders, the number of files they contain, reads each file to prime the OS disk cache, and optionally flushes the OS disk cache, is available in the post titled WordPress OS Disk Cache Report, Prime and Flush.
Disk Cache is Local
Memcached can be installed on one or more centralized servers, and offer caching services to other servers over the network (preferably on the same subnet and switch to minimize latency). Disk cache is local, and is only beneficial to local applications. If you have several front-end servers, for example, each one will have to build it’s own disk cache over time. By using a centralized memcached service, all front-end servers can read and write to the same cache.
If your cached information needs 2 GB of memory (for example), then each server would need at least 2 GB of free memory to use a disk cache. If you use a centralized memcached server instead, then memcached will need 2 GB of memory, but not the network servers using it. So, if you have 10 front-end servers using a centralized memcached of 2 GB, then you’re using 18 GB less memory with memcached than by using OS disk caching (2 GB x 10 for disk cache vs a 2 GB centralized memcached).
When using a centralized memcached server, network latency becomes very important — it affects the performance of memcached like I/O latency and low memory affects disk caching. Any network latency is added to every single transaction to / from the memcached server, so any savings here will directly impact the performance of the memcached cache.
Disk Cache is Simple
Disk caching needs no special tools or commands to maintain or “flush” the cache. Disk caching uses regular files, which can be easily created, copied, moved, or removed. The applications using disk cache can maintain the files themselves, or a simple script / cronjob can be run to remove older entries.