The Importance of Caching WordPress

Update: The last page has been updated with some W3 Total Cache Rules if you would like to use W3 Total Cache with File-based caching (disk enhanced). Performance wise, it should be about the same as WP-Super-Cache that you see here, but maybe a bit better since you’ll also get the benefits of database/object caching behind it

IMPORTANT: This article is NOT a Wp-Super-cache vs W3-Total-Cache comparison, or standoff. This article is simply to illustrates the benefits of using a caching plugin with your WordPress installation and how it can save you not only hits, but resources on your server. Besides it has some pretty looking graphs that don’t really mean much for you to look at…

WordPress, in all its glory… is a damn sloppy hog.

I use WordPress, many of my clients use WordPress, a lot of you reading this probably use WordPress as well. But in it’s simplicity of installation, and vast library of plugins, themes and other addons, it is quite simply a resource hog. A simple one-post blog would easily rack up the PHP memory usage and still be quite slow in handling multiple visitors.

If you are hosting clients using wordpress, you cannot tell me that not one of them didn’t approach you about increasing their memory limit, or their site being incredibly slow after being hit with a moderate amount of traffic.

But despite all that, I like WordPress. It’s easy to use, update and modify; even I want to be lazy at times.

The Setups

For the purpose of this article, I cloned my blog to three different subdomains onto three separate VPSes with the same base configuration. A Debian squeeze 6.0 based installation, with 512MB assigned, and running Nginx 0.9.4, PHP-FPM 5.3.5, and MySQL 5.1.

All three had the following plugins installed:

  • Google XML Sitemaps
    Automatically generates sitemap xmls and reports new postings to search engines like Google.
  • Mollom
    Alternative to Akismat plugin, which catches spam comments.
  • WP-Syntax
    Plugin that uses GeSHi to provide syntax highlighting. Something I use quite a bit on this site for code examples.
  • WP-Mail-SMTP
    Plugin to use an SMTP server instead of php’s built in mail() function.

As you can see, nothing too fancy, pretty basic.

No Caching

This setup is basically only the above plugins, no additional caching option turned on other than what would come with wordpress (which seems to be none…)

W3 Total Cache w/ memcached

This setup has the W3 Total Cache installed, with all of the caching options (Page, Minify, Database, Object) set to use a local memcache daemon with 32MB of memory set (all of the objects seemed to need no more than 25MB). Some of the stuff suggested by Browser Cache is already handled by nginx such as mime types and a location block to set an expiration header to static content.

Memcached was installed from the normal Debian repository, and the memcache extension for PHP installed via pecl.

This type of setup is ideal for those who really don’t want to hassle with the nginx configuration aside from the usual optimization you apply to static content, and the one-line-try-files solution to WordPress’ permalink feature. It also makes it very easy to clear the cache when you make a change.

WP Super Cache

Doesn’t cache things like the database or performs minifying operations. I simply set this one up with disk caching the files to a static folder. Nginx was configured to look in these folders and serve off any static cache that was found. I’ve tested this with both gzip precompression enabled on nginx (serves a .gz file if it exists), as well as having nginx handle the gzip compression on the fly.

Essentially with this setup, most visitors rarely actually make it to the PHP backend (as the php pages themselves are cached as html output onto the disk).

This setup is ideal if you need to spare PHP from running as much as possible. Logging into the back of course will still have that little spike of usage. And of course with PHP rarely being accessed you’ll likely need to setup a cron job for scheduled postings and other maintainance to wp-cron.php.

PS: W3 Total cache can also use disk-caching on top of the additional features it offers, but I figured It’d be nice to show how much memcache can help even with every user visit being sent to the PHP backend.

The Method of Testing

Testing was done very simply with an application called Siege. Each scenario was tested separately, so that the environment conditions they were in were as close as possible (it would be kind of silly to try to run a test on all three at the same time if they were on the same physical server despite being in separate VPS containers).

Each scenario was hit with a surge of connections, ranging from 5 concurrent browsers, to 150 concurrent browser. Each Siege test was run for 5 minutes, with a pause of 4 minutes in between. Meaning starting from 5 concurrent browsers, it would run for 5 minutes hitting up the destination as much as possible, then pause for 4 minutes before trying again with 10, then 20, then 30 then 40 concurrent browsers and so forth.

An example of the command would be:

seige -c# -t5M -b http://destination

Where # is the number of concurrent browsers to use, and b sets it in benchmark mode, allowing no delays between connection attempts. This is not very realistic in terms of actual internet traffic which tends to have some delay, and connections occur at random intervals rather than all-out-slashdotting/digg (which is still technically random, but just more intense). The idea here is to illustrate just how much load WordPress could theoretically take under a large amount of traffic.

The “attacking” server was another VPS, so some of these numbers would rarely be possible in the real world. Normally most users on shared or VPS hosting don’t normally have access to higher than a 10mbit uplink, let alone the next-to-nothing latency you get between two VPSes on the same box, so essentially we’re breaking it down to where the webserver, PHP and database server become the main bottleneck.

Also keep in mind, the test does not attempt to download anything other than the front page html, meaning additional requests for images, css, js were not retrieved. Just simply the download of the front page, which is far less than what a normal browser would download off a typical blog.

On the following pages are all the pretty graphs and numbers between each scenario.

5 comments

  1. kbeezie says:

    In the memcached configuration, the obvious difference is that PHP is being accessed every single request to the site which has the additional overhead of PHP checking the request, then retrieving the keyed item from memcached and serving it. In the file-based configuration (which both W3 total cache, and WP supercache can be configured with) , Nginx is completely by-passing PHP all together and serving static content directly from the disk.

    As to why; first off I don’t get that much traffic and even if I do with the current configuration can still handle quite a bit. Also with this configuration I don’t have to modify nginx with any excessive rewrite rules to check for disk-based cache, and publishing new content is easily refreshed with the memcache setup. I used to use wp-super-cache with preloaded cache almost strictly which was fast indeed over very high load (which I almost never get), but it only caches files, it doesn’t improve the performance for users logged in, and I have to make sure to clear the disk cache when I make a change to the design or site.

  2. mastafu says:

    Great article.

    However I have following problem.

    My WP setup is like that.

    Currently I am on shared hosting with WP + W3 Total cache and during peak hours, my site is very slow. That is mainly because I have a huge traffic from Google.

    My webstie caches plenty of keywords with AskApache Google 404 and Redirection.

    What happens is that traffic from Google goes to /search/what-ever-keywords dynamicly created everytime. And that is killing my system.
    The problem is I have no idea how to help poor server and cache that kind of traffic.

    Would you have any advice for that ?
    Regards,
    Peter

  3. kbeezie says:

    That’s a rather good question, especially considering you can’t easily cache random searches. I was looking into it, it seems to also be a common way of overloading a wordpress site.

    The Nginx webserver does provide one feature that may help, called the Limit Request Module (http://wiki.nginx.org/HttpLimitReqModule)

    Essentially you could have a location block like so (the line above goes somewhere in http block):

    limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;
    location /search { limit_req zone=one burst=5; rewrite ^ /index.php; }

    Essentially what happens is that the location /search is limited to a rate of 1 request per second based on a visitor’s IP address. A burst of 5 mins that they have only 5 times that they can exceed this rate before they are hit with a 503 error response. Google for example see’s 503 as kind of a de-facto “Back off” response.

    The rewrite is there since on wordpress there shouldn’t ever be an actual folder named search, and all search requests are going to /index.php anyways.

  4. mastafu says:

    kbeezie thank you for your replay.
    I think that this is not an issue here. What happens right now is that betwean 7pm till 9pm I am being strongly attacked from google … like 20-50 req/s
    So I would probably need 8 cores or smth … which is super expensive … plus only required for some time during a day.

    I need to look in to your limiting module, what would be great is that is someone searches to much, he should be redirected to main page, or specified page where he would see warning and not 503 error. That is big to drastic, I think.

    What do you think, is that possible ?

  5. kbeezie says:

    By the way I learned that the rewrite line will actually act before the limiting had a chance to act. So have to use try_files $uri /index.php; instead which allows the limiting module to act before trying for files.

    Far as google, 503 is the de-facto standard for “back off” to the google servers. You can however create a google webmaster account ( https://www.google.com/webmasters/tools/ ) , add your site, verify it, then set your crawl rate manually rather than google doing so automatically. This way you can have some control in preventing google from crawling your site too quickly.

    More hardware isn’t always the key to improving your site. Far as an 8 core, even a 4 core (or 4 core + hyper threading) would be fine and not all that expensive unless you go with prices from places like rackspace and such (though expensive if you’re only used to VPS pricing).