J@ArangoDB

{ "subject" : "ArangoDB", "tags": [ "multi-model", "nosql", "database" ] }

Speeding Up Server-side Operations

Sometimes it is easier to make server-side operations run a bit faster. In the following post I’ll show a few low-level optimizations that can be applied to user-defined JavaScript code that is executed inside the ArangoDB server.

Scope

Some data-access operations can be sped up by using the appropriate indexes, but that’s not what I am going to show here. Instead, I want to demo a few easy optimizations that don’t require any changes to the data. Only JavaScript code needs to be minimally adjusted to use them.

Note that I am not talking about code that is executed in the ArangoShell here. I will only be looking at code that is executed inside the arangod server instance. The natural places for using custom JavaScript code in the ArangoDB server are for example:

Of course it does not make much sense to optimize operations that are not called very often. The code changes I show will only be useful for server-side operations that are called very often, for example, from within loops or from batch processing actions.

Before starting to change any code, please make sure that the code is executed often and that it accounts for a significant part of the total execution time.

Baseline

Imagine the following custom JavaScript code running somewhere inside ArangoDB:

baseline
1
2
3
for (var i = 0; i < 100000; ++i) {
  db.test.save({ value: 1 });
}

This code inserts 100,000 documents into a collection test. Each document has one attribute only. These numbers are arbitrary, but good enough for a demo.

What can we do to improve the runtime of the above code?

Non-optimizations

The for statement itself is not worth optimizing. It won’t matter much if we used pre-increment or post-increment for the loop induction variable i or if we turned the for loop into a while loop. Any changes here might only save us a few nanoseconds in total, but are likely to make the code more unreadable.

Let’s not do that!

Avoiding accessors

Clearly, we should be looking at the save operation.

db.test.save() looks like a function call, and we learned that function are expensive. In this case, we cannot avoid the function call to save(), but we can avoid another hidden function call. Yes, db.test actually calls a function, though it does not look like it does.

The db object has auto-magic member attributes. The db object will have a member attribute for existing collection. The member will automatically vanish when a collection gets dropped, and the member will rename itself when collections are renamed.

This magic is made possible by late-binding attributes and using accessor functions for attribute accesses on the db object: whenever the attributes of the db object are queried, an accessor function (property query) is called internally to compile them. Accessing a specific attribute of the db object will also call an accessor function (property get). This is exactly what happens in our case when we access db.test.

If this was too complicated, it may become more obvious if we modified the original code to this:

using attribute lookup
1
2
3
for (var i = 0; i < 100000; ++i) {
  db['test'].save({ value: 1 });
}

Now it should be obvious that accessing test requires an attribute lookup on the db object, and behind the scenes the same will happen if we had written db.test instead.

Let’s avoid the repeated call to the accessor function inside the loop! This can easily be achieved by assigning db.test to a variable once and forever outside of the loop. This technique is called loop-invariant code motion, and it can be applied in a lot of other situations, too:

loop-invariant code motion
1
2
3
4
var collection = db.test;
for (var i = 0; i < 100000; ++i) {
  collection.save({ value: 1 });
}

(on a side note: you cannot assign db.test.save to a variable and call it as a function)

Enjoying the silence

The save operation is chatty. Every time it is called, it will return some meta data from the just-inserted document, e.g.:

save result
1
2
3
4
5
{
  "_id" : "test/11443981931568",
  "_rev" : "11443981931568",
  "_key" : "11443981931568"
}

In our case, we’re not interested in these returned values, and we don’t capture them in a variable. The save function doesn’t know this and will happily assemble its result array. The array consists of three string values (six when also counting attribute names). Setting up the result definitely requires costly memory allocations and string copying.

We can avoid all this by passing an options parameter into save, and setting its silent attribute to true:

silence option
1
2
3
for (var i = 0; i < 100000; ++i) {
  db.test.save({ value: 1 }, { silent: true });
}

Now save() will only return a boolean value, which is much quicker.

Transactions

Yet another alternative is use to wrap the operations in the loop into a transaction. Transaction themselves won’t buy us much feature-wise, so why use them? The reason is simple: if we do not use a transaction ourselves, each save operation will implicitly be executed in a transaction of its own. For a loop with 100,000 operations, that will be 100K transactions!

So when we put all the operations into a single, now explicit transaction, we can save the overhead of 99,999 transaction begin and commit operations. Here’s how to do it:

transaction
1
2
3
4
5
6
7
8
9
10
db._executeTransaction({
  collections: {
    write: "test"
  },
  action: function () {
    for (var i = 0; i < 100000; ++i) {
      db.test.save({ value: 1 });
    }
  }
});

Results

How far have we got with these minimal code adjustments?

I have put together a script that can be run in arangod. The script will run each version of the loop 10 times and time the execution. The minimum, maximum and average execution times are printed (in seconds, less is better). Note that the absolute times do not matter much here. Please have a look at the percentage column, which shows the execution time of each variant in comparison to the baseline.

Here’s an excerpt of the script’s output:

results
1
2
3
4
5
6
7
test name      |   total (s) |     min (s) |     max (s) |    avg2 (s) |       %
--------------------------------------------------------------------------------
baseline       |     13.0940 |      1.2907 |      1.3357 |      1.3084 |  100.00
loop-invariant |     10.6888 |      1.0506 |      1.1042 |      1.0667 |   81.53
silence        |     11.7186 |      1.1512 |      1.2247 |      1.1678 |   89.25
transaction    |     10.1521 |      0.9987 |      1.0346 |      1.0149 |   77.56
combined       |      7.8545 |      0.7768 |      0.7977 |      0.7850 |   59.99

As can be seen, moving the loop-invariant accessor function call outside of the loop provided an almost 20% speedup (from 1.30 to 1.06 s). Using the silence option did also provide some, but not the same speedup. Using transactions reduced the execution time, and by putting all this together, a reduction of about 40 % was achieved.

Your mileage may vary. Please feel free to adjust the test script and run your own tests.