WEB Advent 2010 / Share and Enjoy



Share and Enjoy” is the motto of a certain division of the largely successful Sirius Cybernetics Corporation. I’m here to complain about it.

Data is precious. It is expensive to produce, time consuming to fetch, and eventually discarded. It makes sense to share it and save all the hassle that goes into herding it out of the remote servers it lives in. There can be, however, too much of a good thing.

In short, sharing is awesome, but you should care where you put your data.

API rage

The best kind of sharing, of course, is the type you don’t even know exists. Transparent caches work better than the regular user API kind, because the abuse, when it happens, is systematic and universal. Sooner or later, you’ll need an API, but leaving the guts of the cache open to use will constantly have you saying, “You’re doing it wrong.”

Good caching APIs are hard to design. Making them foolproof is nearly impossible, because you end up punishing the users who access them properly. As a API designer, the hardest lesson to learn is to never make promises you don’t want to keep.

I’ve learned that lesson with APC.

The return value trap

One hand taketh what the other hand giveth. When either one’s empty, it’s up to the documentation to explain why we ended up with a NULL, where data was supposed to appear. Since you want a cache to be fast rather than completely honest, it could be anything from “I’m busy” to “I haven’t got what you asked for” and everything in between.

Everything revolves around the clarity of documentation here. In the above situation, most people using the cache would assume the latter. The problem is that the cache might have actually meant was “I got bored of looking… uh, I mean ETIMEOUT.”

The story is the same when putting data into the cache. A successful return value means so little when the next queued request is about to discard the data. The delays that ensure that data was stored in its proper condition are not unlike waiting at the post office until your letter is delivered. In reality, sharing data should not be the responsibility of the one who has stored it.

Smoke signals

For a language like PHP, there is no built-in system to communicate between requests. APC’s mechanisms to share data among multiple requests provides an easy shortcut to set flags and signal events between them. Unlike the good old-fashioned “best effort” type cache, these new methods are meant for immediate visibility and reliably across the entire system. It is for these reasons that the system goes about doing everything slowly and steadily.

I die a little every time I see apc_add() used as a mechanism to ensure exclusivity, especially if the key includes _lock.

Take me to the cleaners

Most caches are fast if you don’t count the slow parts.

Because APC is used for signaling, cleaning up is the slowest part; the entire cache needs to be nuked for most of the code to work properly. But, just like any other portable PHP extension, this has to be done inside a manual invocation of a function. There is no implicit way to run an independent cleanup job in the background. Sooner or later, a request is going to get hit by a cleanup routine.

The standard issue with cleanup during a request is that the request can be aborted by a user — PHP can kill a partially-complete request if ignore_user_abort is not set. When this happens, the entire cache can become deadlocked with its memory ending up in an inconsistent state.

Fortunately, most caches rarely need a cleanup.

Fill ’er up

As you might have guessed, caches do overflow. People will cache whatever gives them a performance boost — as they should — but not all scenarios are created alike.

Scalability turns this into a strange and dangerously-abused concept. Throwing user data into a shared space is an excellent strategy if you’re building a system with a single server. You need it now, you need it later, and of course, you’ll need it repeatedly. Throw in six hundred servers and have your users rotate among them, however, and this type of caching turns into a complete waste of time. The same cache that turned your system up to eleven in QA is suddenly becoming a little CPU-eating monster — and not the cute kind.

Conclusion

Fundamentally, all of the flaws of APC that have come to light in recent times have been due to the API and the existing code that would break if I actually fixed something. Backward compatibility is the price you pay for being popular. The requirements for a simple data cache are far neater and cleaner than those of an opcode cache like APC.

I wrote hidef to solve this problem when the cached data rarely changes. This use case is nearly ideal for something like a localization table or a configuration database that needs hierarchical lookups at minimum cost. This is a valid alternative in some scenarios, but not all.

Now it is time to write something completely different, in an all-too-familiar way. I’ve got my ideas in a row. Cache misses will have to be tagged by cause, reads will be timed to fail quickly, writes will be best-effort, and the actual data will live on disk, without having to bookkeep bare data structures. The idea is to take all that’s missing from APC and build something new.

Until that’s done, though, you’re stuck with “Share with Care.”

Other posts