@radekmie
Arc
and Rc
in RustBy Radosław Miernik · Published on · Comment on Reddit
I wrote my first non-trivial program in Rust back in late 2019. Since then, I rewrote most of my PhD’s code in Rust (it’s not public yet), ported the entire Legends of Code and Magic engine, and even published a somewhat popular changestream-to-redis
. It still feels fresh, but I’d say I’m fluent in it.
The more I code in Rust, the more I look into optimizations. Partially because I finally feel confident with my code, partially because I have even more skin in the game (I’m getting paid to code in Rust now), and partially because I just love the feeling of making things go faster. (Even if it’s not needed.)
Recently, I wanted to optimize one of the many AST passes in my PhD’s code, annotating arbitrary expressions with types. The idea is fairly simple: analyze subexpressions recursively, infer the expression type, and add the annotation. But that involves a lot of cloning…
The expressions are inherently recursive. As such, they will require some indirection to make Rust happy. Currently, they’re structured as follows:
There are at least a few alternatives to Arc
, but…
Expression
s are heavily reused, so Box
would be a waste (.clone()
is not as cheap as with Rc
and Arc
).Expression
needs to be Send
(i.e., it can be sent to another thread); Rc
is !Send
, so it’s also out.Expression
s, e.g., constant folding. While we are deep in the tree, we’d need to have some “external storage” for the original values so they could be borrowed. It’s doable, but the complexity would be significantly higher. So, borrows are also out. …so Arc
it is.
Let’s try to implement the add_casts
function. It should add a Cast
on top of each Expression
using the type it got from infer
. It’d look like this:
// The below code is just a simplification -- it may not compile!
It’s a direct implementation of what we described, right? Note that it takes &self
and returns Self
: that means it always clones the entire Expression
. But what if it’s not needed?
The more AST transformations there are, the more I have to think about the order they are applied in. Instead of spending hours experimenting what’d be the ideal one, I decided to do all of them in a fixed-point loop.
As a result, a lot of transformations are not doing anything (as expected). For example, if add_casts
already added all Cast
s, then the next loop iteration will not have anything to do – that’s fine.
But as we already said, it’ll clone all of the subexpressions anyway. When profiled, roughly 30% of the time is spent in memcpy
. It’s a lot, considering that all of them are thrown out immediately.
Maybe we could reuse these Arc
s somehow?
It turns out we can, and it’s really simple: meet Arc::make_mut
. It takes an Arc
and returns a mutable reference to the inner value. Of course, it works only if it’s not reused (i.e., the strong count is 1), so it has to clone it if needed.
There’s a complexity cost, though: we have to have a &mut Arc
, i.e., all of our transformations have to take either self
or &mut self
. Let’s see what the code could look like:
// The below code is just a simplification -- it may not compile!
The code is not that different, but you can see where it’s going. Now, instead of constructing new Expression
s at each step, we mutate them directly, even though they’re wrapped in Arc
s. (Lazy cloning happens in the background.)
The second implementation allocates only when needed – that’s awesome! If we look at the numbers, it’s almost twice as fast. I believe the actual gains will differ based on how many (and how large) clones they were, but it seems like it’ll always help. At a cost of mutability, of course.
This was a short one, but maybe it’ll help some of you just diving into Rust.