@radekmie

On Reifying Nested Closures in Rust

By Radosław Miernik · Published on · Comment on Reddit

Table of contents

Intro

As the DDP Router is now open source, let’s discuss one relatively simple optimization I came up with while working on it. For those who never heard of it: it’s an almost drop-in replacement for Meteor’s reactivity implemented as a proxy layer. If it’s still unclear, don’t worry – it won’t be important today.

The core of the DDP Router is a reimplementation of Meteor’s Minimongo, an in-memory JavaScript MongoDB emulator. At its core, the Matcher class does the heavy lifting and handles all the querying logic, i.e., checking whether a document matches a query.

In JavaScript, it’s implemented as a cascade of nested closures, i.e., functions take some information about the query and return a function that, given a part of the document, will perform the actual match (example).

Now, in Rust, we know it includes a lot of indirection and hence is a (potential) performance problem. That’s why instead of passing an awful lot of closures (e.g., Box<dyn Fn(...) -> ...>), we’ll reify them in actual structs.

JavaScript code deep dive

For starters, let’s see the Matcher class in action. I decided to go with some trivial queries, in case you’re not familiar with the MongoDB query language.

// a > 3 && b === 5
const matcher = new Matcher({ a: { $gt: 3 }, b: 5 });

matcher.documentMatches({ a: 1, b: 3 }); // false
matcher.documentMatches({ a: 1, b: 5 }); // false
matcher.documentMatches({ a: 4, b: 3 }); // true
matcher.documentMatches({ a: 4, b: 5 }); // true

Now let’s go down the rabbit hole of what does new Matcher(...) actually do (I intentionally skipped the unrelated sections as well as two fast-paths):

class Matcher {
  constructor(selector, isUpdate) {
    // [...]
    this._docMatcher = this._compileSelector(selector);
    // [...]
  }

  _compileSelector(selector) {
    // [...]
    return compileDocumentSelector(selector, this, { isRoot: true });
  }
}

As expected, the actual logic lies elsewhere (validation and edge cases omitted):

function compileDocumentSelector(docSelector, matcher, options) {
  const docMatchers = Object.keys(docSelector).map(key => {
    const subSelector = docSelector[key];

    if (key.substr(0, 1) === '$') {
      // [...]
      return LOGICAL_OPERATORS[key](
        subSelector,
        matcher,
        options.inElemMatch
      );
    }

    // [...]
    const lookUpByIndex = makeLookupFunction(key);
    const valueMatcher = compileValueSelector(
      subSelector,
      matcher,
      options.isRoot
    );

    return doc => valueMatcher(lookUpByIndex(doc));
  });

  return andDocumentMatchers(docMatchers);
}

In other words, for every key in the selector, translate it into a logical operator or a value matcher, and combine them using andDocumentMatchers. The sample query has only two fields, and as neither of them is a logical operator, let’s talk about the makeLookupFunction. It’s too big to cite here, so let’s see it in action:

const lookupA = makeLookupFunction('a');
lookupA({});       // [{value: undefined}]
lookupA({a: 1});   // [{value: 1}]
lookupA({a: [1]}); // [{value: [1]}]

Simple, right? It returns a list of object values at a given path. A list, because the path can target a property nested in an array of objects (see below). And instead of “naked” value, it’s an object, as there may be some metadata about the value itself (I won’t go into detail about it today).

const lookupAX = makeLookupFunction('a.x');

// Plain value.
lookupAX({a: 5}); // [{value: undefined}]

// Object.
lookupAX({a: {x: 1}});   // [{value: 1}]
lookupAX({a: {x: [1]}}); // [{value: [1]}]

// Array of objects.
lookupAX({a: [{x: 1}, {x: [2]}, {y: 3}]});
// [{value: 1, arrayIndices: [0]},
//  {value: [2], arrayIndices: [1]},
//  {value: undefined, arrayIndices: [2]}]

Now, let’s get back to compileValueSelector (regex case omitted):

function compileValueSelector(valueSelector, matcher, isRoot) {
  // [...]
  if (isOperatorObject(valueSelector)) {
    return operatorBranchedMatcher(valueSelector, matcher, isRoot);
  }

  return convertElementMatcherToBranchedMatcher(
    equalityElementMatcher(valueSelector)
  );
}

The isOperatorObject is basically the same check as for the logical operator, i.e., whether the key(s) start with a $. It also performs additional validation, e.g., whether there are only operators or nested fields (they cannot be mixed).

Let’s see the equalityElementMatcher now (validation omitted + rename):

function equalityElementMatcher(elementSelector) {
  // [...]
  return value => isEqualInTermsOfMongoDB(elementSelector, value);
}

Yeah, it’s that simple! The {b:5} part of our query only has to check whether there’s 5 among the values we got from the lookup. I say one of, since that’s what the convertElementMatcherToBranchedMatcher is doing (too long to cite).

Similarly to compileDocumentSelector, the operatorBranchedMatcher goes through all the operators, and merges them back using andBranchedMatchers. Actually, both andBranchedMatchers and andDocumentMatchers are aliases to one function! And all it does is check whether all matchers agree that there’s a match (plus some metadata bookkeeping).

We can finally visualize our matcher from before. If we’d inline all the functions we went through (and some of their arguments), as well as skip the metadata and validation, it’d look more or less like this:

// const matcher = new Matcher({ a: { $gt: 3 }, b: 5 });
const lookupA = makeLookupFunction('a');
const matchA = value => isGtInTermsOfMongoDB(3, value);

const lookupB = makeLookupFunction('b');
const matchB = value => isEqualInTermsOfMongoDB(5, value);

const documentMatchers = [
  doc => lookupA(doc).some(x => matchA(x.value)),
  doc => lookupB(doc).some(x => matchB(x.value)),
];

return doc => documentMatchers.every(matcher => matcher(doc));

It took us a lot of digging, but the result makes sense, right? For every field, look it up and see if it matches. With that mental model, let’s try to get rid of all these functions and try to come up with something else.

Reifying closures

If you have ever read about Haskell, you probably also heard about pointfree programming, where instead of passing arguments around explicitly, we’d like to compose functions. If you need some intuition: map(x => fn(x)) becomes map(fn) (η-conversion). We could do something like that with the above code:

// const matcher = new Matcher({ a: { $gt: 3 }, b: 5 });
const lookupA = new Lookup('a');
const matchA = new GtMatcher(3);

const lookupB = new Lookup('b');
const matchB = new EqualityMatcher(5);

const documentMatchers = [
  new DocumentMatcher(lookupA, matchA),
  new DocumentMatcher(lookupB, matchB),
];

return new AllDocumentsMatcher(documentMatchers);

What happened? We replaced closures with instances. Sure, underneath we’ll have the same functions, but the point is that there’s exactly one instance of AllDocumentsMatcher.match (and it does the .every() call).

There’s also a very nice side-effect: we can now inspect this structure easily! This is not possible with closures, since if we’d .toString() a function, we’d get its source, but without its context (i.e., without the local variables).

Furthermore, we can do other things with such representation: normalization (e.g., flatten nested $ands), validation (e.g., check whether a query refers some restricted field), or serialization (i.e., getting back the original format). Now, all of these can be done with the closures too, but it’d require storing the original query and parsing it repeatedly with a different goal.

Rust code overview

In the first version, I didn’t implement any “compilation” – every query was walked over and over for every check, in the is_matching_document function.

An actual Matcher got implemented in 1c30dda, and it looked exactly like the structures I drafted in the examples above. The “central” DocumentMatcher has four variants, one for every closure in JavaScript:

pub enum DocumentMatcher {
  // `LOGICAL_OPERATORS.$and`.
  All(Vec<Self>),
  // `LOGICAL_OPERATORS.$or`.
  Any(Vec<Self>),
  // `LOGICAL_OPERATORS.$nor`, though it can handle any negation.
  Invert(Box<Self>),
  // `compileValueSelector`.
  Lookup {
    lookup: Lookup,
    matcher: BranchedMatcher,
  },
}

The BranchedMatcher looks almost identical, though it has a variant with the ElementMatcher which is the only place where we actually look at the values and not their structure. And Lookup ended up as a small helper struct, not to mix actual matching and navigating through the objects.

There’s no point in going through the compilation, as the process is identical with one small difference: instead of returning closures, we return the above structs. Here’s Rust’s counterpart of compileDocumentSelector:

impl DocumentMatcher {
  fn compile_inner(
    selector: &Document,
    is_in_elem_match: bool,
    is_root: bool,
  ) -> Result<Self, Error> {
    Ok(Self::all(
      selector
        .iter()
        .filter(|(key, _)| key.as_str() != "$comment") // Ignore it.
        .map(|(key, sub_selector)| {
          if key.starts_with('$') {
            Self::compile_logical_operator(
              key,
              sub_selector,
              is_in_elem_match,
            )
          } else {
            Ok(Self::Lookup {
              lookup: Lookup::new(key.to_owned(), false),
              matcher: BranchedMatcher::compile_value_selector(
                sub_selector,
                is_root,
              )?,
            })
          }
        })
        .collect::<Result<_, _>>()?,
    ))
  }
}

The entire process was rather mechanical – going through every function, call by call, changing closures into structs. A little tedious, but the benefit of having a complete and bug-to-bug matching implementation was definitely worth it. I actually went one step further and decided to copy the entire test suite almost verbatim (compare the ones in JavaScript with the ones in Rust).

Closing thoughts

While I focused on Rust here, this pattern can be applied to every language where structs/classes are easier to work with than closures. Maybe it could be automated, at least partially…? Let me know if you’ll give it a try!

As for the performance difference, we could say this change would benefit the JavaScript version too, but it’s usually less of a priority due to JIT compilation and runtimes being optimized for handling a crazy amount of closures.

As for the title… Who knows, maybe I just coined a new term?