On Quick and Dirty Caches

By Radosław Miernik · Published on 30 July 2024 · Comment on Reddit

Intro

While mega corporations aim for caches consistent 99.99999999% of time, here I am, slapping another quick and dirty one just to solve a problem at hand. “But cache invalidation is a huge problem!” Well, it definitely is. That’s why we should always know what we are working with and what the constraints are.

When it comes to software performance, we have to know our goals as well as the possible solutions (including available technologies). As an example, if we could make something 10% faster within an hour or 30% faster in a week, I’d definitely go with the former. If needed, we’ll do the latter. If.

Simple enough

Low-hanging fruits are everyone’s favorite, right? By definition, they give you some almost instant results and are – again, by definition – not too risky to reach for. How often do we reach for a semi-random 3rd party package or service to implement a proof of concept? I’d say it’s not a problem. But how often do we grow out of it and never have time to deal with it? Well…

The same argument goes for performance-related things, but here it’s usually an in-house solution. Have you ever solved a performance problem by adding a 3rd party package or service? Sure, we can replace one we already use with something else (SWC, anyone?) or introduce a distributed cache like Redis, but I’d argue that _.memoize and friends are far more common.

Crunching some data? Good ol’ _.memoize will do. Rendering something more complex? Maybe memo in React and v-memo in Vue will help you. Batching would be nice as well? Then dataloader it is. Need some cache eviction? Try lru-cache or its “time-based use-recency-unaware cousin”, @isaacs/ttlcache.

All of the above solutions are great for one very particular use case, and that’s amazing. What’s even better is that most people are far more aware of their shortcomings than they are of more advanced ones, like Redis. That doesn’t mean these don’t have any footguns – we’ll get to that.

Good enough

Aren’t the above just a bandaid slapped on top of a steaming pile of… Bigger problems? Yeah, usually. But that’s not something we should be ashamed of! It’s true that most of the time, we’re working in a highly constrained environment, without enough time to polish (or even finish) everything we have started.

Quick and dirty caches are just like the tech debt – it’s not objectively bad to add more, as long as you know why you are doing so. Trade-offs are common, and we have to learn to choose. I could argue that exactly this differentiates an experienced developer: an ability to reason about an imperfect solution.

$\_.memoize bell curve$

Footguns

As mentioned, there are some non-trivial questions around. Whether it’s memoizing functions with multiple arguments, deciding when to memo, or an out-of-memory error, these are “just tools” and can be misused as well.

At the same time, using the aforementioned Redis (or a similar service) is far more complex. Will it scale better? Yeah, most likely it will. But does it involve an infrastructure change, making the deployment at slightly more complex and expensive? It is, unfortunately.

“Sure, we can use Redis to cache that! And what should we do when its is down?” Try this question on your next interview and see how the majority of people never even thought of it. Because you did, right?

Closing thoughts

It all started last week when we deployed yet another quick and dirty lru-cache. The pull request was six lines long and reduced the average execution time by ~30%. It took about two hours to implement, test, and release it. Even if we look at the return on investment, it will pay off within a week.

And, of course, we could do better! We could adjust the cache parameters (size, expiration time), use multiple caches (i.e., instead of caching one large function, cache two smaller ones independently), or…

Not. It’s fine. We’ll improve it again later.