Last Updated: February 25, 2016
·
407
· matthiasmullie

Cache! A-ah. Saviour of the Universe

Cache

Caching is awesome. Instead of repeating an expensive operation, just reuse the
result from last time! But properly using cache can be tricky... But there are a
couple of things I'd like to not have to deal with:

Stampede protection

Cache is essential if you have a high-traffic application. But unless there's a
steady stream of predictable traffic, you risk having no data in cache at the
time you need it most: when traffic spikes.

When all of a sudden, there are a lot of simultaneous requests for data that is
not in cache, the expensive operation that generates that data will be executed
a lot of times, all at once. Given enough traffic, the application server likely
won't be able to handle that and crash.

It should be possible for the very first request to signal the others that it's
already computing the expensive code. It should tell those others to not do that
as well, but just sit tight and wait for the result. Not every process should
have to compute the same result that the first one is already working on...

Repeat requests

Given a sufficiently large application, chances are you're going to fetch the
same value multiple times, because you just happen to need it in multiple places
throughout your app. Since your key-value store is likely on another server,
you're probably losing a valuable millisecond for every repeat request.

Once a result is fetched from cache, it could just be kept in memory. When you
later attempt to fetch the same key in the same request, it could then just be
served from memory. Or when you store a new value & need it again later, that
too could be duplicated in memory. Just make sure to evict data from memory
before it fills up completely and causes your app to crash.

Transactions

A lot of the data stored to cache are denormalizations of data that's also in
the primary data store (e.g. database) or can be derived from it. That may touch
updates in multiple tables or happen all throughout the code. To ensure data
integrity in the database, I can use transactions, but where does that leave my
cache?

Similar to buffered (in-memory) repeat cache requests, we should be able to
"buffer" writes and store them all at once, once we're ready to do so. Instead
of immediately storing data to cache, it could be delayed until I want to commit
it. Meanwhile we could also write it to memory so that value can be reused
already, even though it's not yet persisted to real cache. That would even allow
for some optimizations, like not storing data to a key we're going to delete
again in that same request. Or combining multiple separate set operations into
one setMulti.

But atomicity is the tricky part here. If one of the delayed operations fail,
the rest should not go through and what has already happened should be restored
to what it was. This could be accomplished by ordering the operations in a
thoughtful manner: first execute the "conditional" operations (e.g. replace
can only happen if a value already existed in cache), then do the ones where no
failure is expected regardless of what may have happened to cache before we
committed (e.g. delete will always succeed).

And to ensure we can actually recover if one of the conditional operations fail,
we should first retrieve their current value. We can them cas
(check-and-set) them back
should they go wrong! The only problem here can be restoring expiration time,
which most key-value stores won't let you fetch.

Compatibility

Ideally, there would be just 1 API for a variety of cache backends. Open source
projects will want to support multiple platforms. Or maybe you're running a
cluster of Redis servers in productions but you'd like to use a memory-based
cache to run your tests?

The PHP FIG has recently settled on php/cache 1.0,
an interface to implement if you'd like to be able to use a wide variety of
cache backends. It's doesn't offer much beyond the basic get & set, but
it'll do for most cases!

I'm sure many psr/cache implementations will emerge, but there are a couple
already. There's one that I've been working on: Scrapbook.
It comes with adapters for Memcached,
Redis,
Couchbase,
APC,
MySQL,
PostgreSQL,
SQLite,
filesystem (via league/flysystem),
and an in-memory cache (very
useful for testing). And it has all of the aforementioned goodies:
stampede protection,
buffered cache &
(nested) transactions.
All of it behind a simple Memcached-like API
(for maximum features) or psr/cache
(for maximum compatibility).