Cache Invalidation with Flags

Cache invalidation is hard, proven times and times by the “clear cache” and “delete all caches” buttons in various caching plugins and hosting control panels. While some of the concepts in this post are applicable to various types of caching, I’ll stick to page caching for simplicity, and of course WordPress.

How page caching works

I’m not going to go into much detail here. I’ve done a live stream a few months ago where I wrote a page caching plugin from scratch for WordPress. If you’re interested in the nuts and bolts, go check that out. Otherwise, here’s a very simple version of what happens in WordPress’ advanced-cache.php, which loads very very early:

$cache_key = get_key(); // from URL, query string, cookies, etc.

$entry = get_cache( $cache_key ); // fetch a cached item
if ( $entry ) {
    serve_cache( $entry ); // serve the cached entry
    exit();
}

function ob_callback( $content ) {
    set_cache( $key, $content ); // Save to cache
    return $content;
}
ob_start( 'ob_callback' );

Again, this is overly-simplified and quite abstract, but the essentials are there:

Serve the cached entry if it exists
Save the contents to cache (via output buffering) to serve from cache later

These are the two key ingredients for page caching, in WordPress, and probably a million other CMS’s and frameworks out there. So where does invalidation come in?

Cache invalidation

If you noticed in the code snippet above, there’s no word about the cache TTL (time-to-live), so at the very basic level, a cache entry that has been set, will never be deleted.

In an ideal world, this is the best option. If you publish a post in WordPress, and you’re never going to update it, why would you ever want it to expire? You’ll want to invalidate the cache entry only if something changes about the post:

The post title or content has been updated
A comment has been posted
A category or tag has been added, removed, updated
A menu item has been added, updated, removed
The site footer has changed (menu item, legal notice, copyright year)
A post title from the “recent posts” widget has been changed
A new post was published and should be visible in the “recent posts” widget
The post author changed their e-mail address, first/last name, bio
And many other scenarios

And that’s just off the top of my head! If you start really thinking about all the possible actions that should lead to cache invalidation of this single post, I bet you could at least double the number of items in this list. Maybe triple.

Most page caching plugins have the first two or three items in the list solved. They use actions, such as clean_post_cache to bind their invalidation routines. WordPress core calls this action when something about the post is updated, so it makes perfect sense.

Clearing the home page cache, and the RSS feed cache also kind of makes sense. There is indeed a high chance, that the post is visible on the home page, and in the RSS feed, if it’s recent enough, so I’d say it’s a pretty safe bet.

But what about the rest of the conditions. What about category archives, author archives, tag archives, date archives? What about search? Or “related posts” widgets or blocks? I guess that’s where timed expiration (TTL) comes in as a “catch-all”, and for anything urgent, there’s the “flush cache” buttons that we all love.

I think we can be a bit more granular and accurate about invalidation, but before jumping into that, let’s touch on cache cardinality and efficiency.

Cache cardinality and efficiency

Let’s start with the simpler one – efficiency.

This is essentially the cache hit rate. If 99 out of 100 requests to a site are served from cache, then the cache efficiency is 99%. Higher is better, but a good number or range depends a lot on what type of site it is and the type of traffic it receives. If it’s a fairly static corporate or news website, 90+ would be good to aim for. A busy e-commerce site would be trickier to cache efficiently, and the moment a visitor adds something to their cart, then page cache usually goes out the window (for that user). Same with membership sites.

Cardinality is slightly related.

It’s the total number of possible options that can be cached, usually per URL. In an ideal world, cardinality will always be 1. However, if your homepage has some “offer for visitors from Italy” which relies on GeoIP headers, then the cardinality for that page is doubled, so 2.

If the page is available in English and also French, then that also doubles the cardinality, now at 4. Oh you’d like to run an A/B test with two different headings? Guess what – double again. Now 8:

Original page, English, variant A
Original page, English, variant B
Original page, French, variant A
Original page, French, variant B
Page with offer for visitors from Italy, English, variant A
Page with offer for visitors from Italy, English, variant B
Page with offer for visitors from Italy, French, variant A
Page with offer for visitors from Italy, French, variant B

Every time an option to the page is added, the cache cardinality will at the very least double. Is that a bad thing? No, but that’s something you should be aware of when adding these options, especially if your cache store is limited and expensive, such as RAM. Also low cardinality usually means better cache efficiency.

Cardinality = cache key

Usually cardinality will directly correlate to the cache key being used for a request, but not always.

WP Super Cache for example has a mode, where the cache key being used is the URL itself, so essentially it forces cardinality to 1, meaning you can’t really have different page variants on the same URL. Super (pun intended) efficient for sure, but it may be inconvenient and also insecure in some cases — a password-protected post content may leak into the same publicly-accessible URL, because it will have the same cache key.

Super Cache has other modes as well, where the cache key is a more traditional hash, generated from the request method, URI, cookies, etc. Any change in those will create a different cache key, even if the page URL is the same. This is the behavior adopted by most page caching solutions.

Some plugins will even allow you to tap into the cache key generation process, and alter it to add more variants. Automattic’s Batcache for example has a special function to vary_cache_on_function(), which allows users to vary the cache key on pretty much anything they want. WP Super cache has a wp_cache_key action you can tap into as well.

So with multiple cache keys for each URL, how do you actually invalidate the entire URL?

Invalidating cache by URL

Well this is tricky, and can sometimes get creative.

You could try and generate all the possible cache key variants for a URL, then walk and delete/invalidate each one. Assuming you know exactly what variants you need to generate. What if you don’t, or what if it’s out of your control?

You could store the normalized URL, and link it to the cache entries. In a relational database for example, you’d have a new URL column for all the cache entries. Then, whenever you need to invalidate a URL, you can quickly find all the related entries by that column. Though most caching implementations will likely avoid relational databases, and opt for the faster key-value stores instead.

Another option is a URL counter, which is pretty smart. I first saw this implementation in Batcache.

A special counter, representing the stored version of the page, is saved together with the cache entry. The current version of the page is stored separately, and every time the URL needs to be invalidated, the current version counter for that URL is incremented, so when the page cache entry is retrieved, we know that it’s outdated from comparing the retrieved version to the current version.

This sounds confusing, I know. Let my try and rephrase this with some code instead.

Saving to cache:

function ob_callback( $content ) {
    $data = [
        'content' => $content,
        'version' => get_current_version()
    ];
    set_cache( $key, $data );
    return $content;
}

Serving from cache:

if ( $entry ) {
    if ( $entry['version'] >= get_current_version() ) {
        serve_cache( $entry['content'] );
        die();
    }
}

Invalidating:

$current = get_current_version();
set_current_version( $current + 1 );

The get_current_version() and set_current_version() functions would take the current normalized URL and save the version number for that normalized URL. Here’s Batcache’s implementation for invalidating by URL, so when you update a post with Batcache, it runs:

batcache_clear_url( $home );
batcache_clear_url( $home . 'feed/' );
batcache_clear_url( get_permalink( $post_id ) );

As I mentioned earlier, this covers only a few cases which should be triggered by invalidation. We could perhaps try to go a step further:

Iterate over the post tags and categories, clear their archives
Get the post author, clear the author archive
Check the post publish date and clear some date archives

Which is honestly just guesswork at this point, because we don’t even know whether the post appears on the first page of that tag archive or not. Do we really want to query all posts in that category, to figure out which page this post appears on, in order to invalidate the correct URL? I don’t think so.

SiteGround for example does a lot more guesswork, and not only for post updates, but many other events in WordPress, since unlike Batcache, they do have the luxury of being able to flush all caches. So when a menu item is updated for instance, SiteGround will clear the entire page cache. The same happens when a plugin is activated or deactivated, or updated. Not very efficient for sure, but usually better than serving stale pages.

Fun with flags

This very problem prompted me to find a better approach. Instead of attaching a normalized URL to a cache entry, why don’t we attach an arbitrary “tag.” And since WordPress already has a “tag” entity, to avoid confusion, we’re going to call these flags.

A flag is an arbitrary string, which can be attached to a cache entry, and invalidated when needed.

For example, the “home” flag can be attached to the home page, all the eight variants. The “post:13” flag can be attached to the post with the ID of 13, also all variants. The “feed” flag can be attached to various RSS feeds, and so on. So whenever a post is updated, much like the Batcache code above, we would do:

add_action( 'clean_post_cache', function( $post_id ) ) {
    expire( 'home' );
    expire( 'feed' );
    expire( 'post:' . $post_id );
}, 10, 1 );

The flagging will then happen during the request itself, for example on template_redirect:

add_action( 'template_redirect', function() {
    if ( is_home() ) {
        flag( 'home' );
    } elseif ( is_singular() ) {
        flag( 'post:' . get_the_ID() );
    } elseif ( is_feed() ) {
        flag( 'feed' );
    }
} );

At first sight there’s really no difference from clearing by URL, other than maybe looking slightly cleaner. But this opens up so much flexibility, most importantly, being able to share a single flag between multiple URLs.

Take the “post:13” flag for example. The permalink may not be the only URL that contains the post with the ID of 13. The category archive page might contain that very post. Or an author archive on page 9. Or a “recent posts” widget on a completely unrelated page.

Actually the home page and the RSS feed may contain the post with the ID of 13.

Now, instead of the guesswork on what to invalidate when post 13 is updated, why don’t we just invalidate, uhm.. Post 13:

add_action( 'clean_post_cache', function( $post_id ) ) {
    expire( 'post:' . $post_id );
}, 10, 1 );

The tricky part would be accurately flag the correct pages. We don’t even need the conditional functions at this point, since we don’t really care what kind of page it is. The only thing we’re interested in, is “is there a post 13 anywhere on this page?” and the closest filter I found to match that is one inside WP_Query called the_posts:

add_filter( 'the_posts', function( $posts ) {
    array_map( function( $ID ) { flag( 'post:' . $ID ); },
        wp_list_pluck( $posts, 'ID' )
    );
    return $posts;
} );

This works on the home page, terms archives, author archives, search results, paged requests, recent posts widgets, a shortcode used to render related posts, a Gutenberg’s query loop block, posts series, any REST API request for posts too. Anything that uses WP_Query to fetch posts, will accurately flag each item.

Oh, and menus are posts in WordPress as well, so updating a menu should invalidate every page where that menu’s being used, header, footer, widget, you name it.

Sure, you’ll probably find some edge cases where it might miss a flag, when get_post() is called directly for example, but I’m sure that with a little trial and error, those can be accurately flagged as well.

Implementation

The exact implementation of the flags concept can vary. I’ll share mine.

Just like with Batcache’s URL versioning, you’ll need a place to store all the flags and their versions, as well as place to add the flags to individual cache entries. Also, instead of version numbers I’ll be using timestamps.

Let’s start with the flag() function:

function flag( $flag = null ) {
    static $flags;

    if ( ! isset( $flags ) ) {
        $flags = [];
    }

    if ( $flag ) {
        $flags[] = $flag;
    }

    return $flags;
}

A static variable will hold all the flags for the current request, and by passing null we can retrieve all the currently set flags. In the output buffering callback, we’re going to save these flags, as well as the created timestamp, like so:

function ob_callback( $contents ) {
    // various checks, key, etc.
    $cache = [
        'contents' => $contents,
        'flags' => flag(), // get current flags
        'created' => time(),
    ];

    set_cache( $key, $cache );
    return $contents;
}

Our expire() function is very similar to the flagging one:

function expire( $flag = null ) {
    static $expire;

    if ( ! isset( $expire ) ) {
        $expire = [];
    }

    if ( $flag ) {
        $expire[] = $flag;
    }

    return $expire;
}

Now we need a place to save these expired flags. For performance reasons, we’ll do it only once during shutdown, and for demonstration purposes we’ll write the expired flags in JSON format on disk:

add_action( 'shutdown', function() {
    $expire = expire();
    $flags = json_decode( file_get_contents(
        '/path/to/flags.json', true ) );

    foreach ( $expire as $flag ) {
        $flags[ $flag ] = time();
    }

    file_put_contents( '/path/to/flags.json',
        json_encode( $flags ) );
} );

Each flag entry is essentially a timestamp of the last time it has been set as expired. So for cache entries created before that time, it’s going to be a signal that they’re stale, and any entries created after that time will be okay. Here’s how serving looks like in PHP:

function serve() {
    $cache = get( $key );
    if ( ! $cache ) {
        return;
    }

    $flags = json_decode( file_get_contents(
        '/path/to/flags.json', true ) );

    foreach ( $flags as $flag => $timestamp ) {
        if ( in_array( $flag, $cache['flags'] )
            && $timestamp > $cache['created'] ) {
            return; // expired
        }
    }

    // serve cached entry ...
}

This is a simplified implementation, which is missing a great deal of detail, but the goal here is to demonstrate the overall approach. There’s a full working example in this GitHub repository which I created when building a caching plugin from scratch during a live stream. It’s a plugin which uses the filesystem to store cached entries, their metadata, as well as flags.

Conclusion

This is not a new concept for cache invalidation, naming can differ, but the idea has been around for a while.

Varnish supports PURGE by URL out of the box, and Hashtwo for purging using secondary arbitrary hashes. Fastly supports adding and purging items by a Surrogate-Key header. Cloudflare also supports setting and purging-by “Cache-Tags” though unfortunately it is for Enterprise customers only.

Nginx unfortunately, only supports purging by URL, and only as part of their commercial Nginx Plus subscription, though there seems to be a free open-source module available, not sure if it still works.

I realize that it is also not very useful for end-users. But just like object caching in WordPress, I think page caching should be as transparent as possible: users shouldn’t decide or guess which URLs to invalidate and when. Users shouldn’t have to think about TTL values. Users shouldn’t be hammering that “flush cache” button.

The “flush all caches on certain events” approach is certainly viable, and so is a low TTL value, to make sure content doesn’t stay stale for too long. But flushing the entire cache when a menu that’s visible in a widget on a single page, is quite the opposite of efficient.

Either way, I think we should be a little smarter about caches and use features like flags (or surrogate keys, tags, etc.), which will allow for longer TTL values, granular and accurate invalidation, and increased cache efficiency.

As a WordPress developer, what’s your current approach to page cache invalidation? Have you ever used the flags/tags features in Varnish, Cloudflare, etc.? Do you just purge by URL? Does your host take care of it completely and you never have to deal with it? Share your feedback in the comments and don’t forget to subscribe.

WordPress