Sometimes it is easier to make server-side operations run a bit faster. In the following post I’ll show a few low-level optimizations that can be applied to user-defined JavaScript code that is executed inside the ArangoDB server.
Scope
Some data-access operations can be sped up by using the appropriate indexes, but that’s not what I am going to show here. Instead, I want to demo a few easy optimizations that don’t require any changes to the data. Only JavaScript code needs to be minimally adjusted to use them.
Note that I am not talking about code that is executed in the ArangoShell here. I will only be looking at code that is executed inside the arangod server instance. The natural places for using custom JavaScript code in the ArangoDB server are for example:
- Foxx controllers
- transactions
- tasks
- traversals
Of course it does not make much sense to optimize operations that are not called very often. The code changes I show will only be useful for server-side operations that are called very often, for example, from within loops or from batch processing actions.
Before starting to change any code, please make sure that the code is executed often and that it accounts for a significant part of the total execution time.
Baseline
Imagine the following custom JavaScript code running somewhere inside ArangoDB:
1 2 3 |
|
This code inserts 100,000 documents into a collection test
. Each document has
one attribute only. These numbers are arbitrary, but good enough for a demo.
What can we do to improve the runtime of the above code?
Non-optimizations
The for statement itself is not worth optimizing. It won’t matter much if we used
pre-increment or post-increment for the loop induction variable i
or if we
turned the for loop into a while loop. Any changes here might only save us a
few nanoseconds in total, but are likely to make the code more unreadable.
Let’s not do that!
Avoiding accessors
Clearly, we should be looking at the save
operation.
db.test.save()
looks like a function call, and we learned that function are
expensive. In this case, we cannot avoid the function call to save()
, but we
can avoid another hidden function call. Yes, db.test
actually calls a function,
though it does not look like it does.
The db
object has auto-magic member attributes. The db
object will have a
member attribute for existing collection. The member will automatically vanish when
a collection gets dropped, and the member will rename itself when collections are
renamed.
This magic is made possible by late-binding attributes and using accessor functions
for attribute accesses on the db
object: whenever the attributes of the db
object
are queried, an accessor function (property query) is called internally to compile
them. Accessing a specific attribute of the db
object will also call an accessor
function (property get). This is exactly what happens in our case when we access
db.test
.
If this was too complicated, it may become more obvious if we modified the original code to this:
1 2 3 |
|
Now it should be obvious that accessing test
requires an attribute lookup on the
db
object, and behind the scenes the same will happen if we had written db.test
instead.
Let’s avoid the repeated call to the accessor function inside the loop! This can
easily be achieved by assigning db.test
to a variable once and forever outside
of the loop. This technique is called loop-invariant code motion, and it can be
applied in a lot of other situations, too:
1 2 3 4 |
|
(on a side note: you cannot assign db.test.save
to a variable and call it as a
function)
Enjoying the silence
The save
operation is chatty. Every time it is called, it will return some meta
data from the just-inserted document, e.g.:
1 2 3 4 5 |
|
In our case, we’re not interested in these returned values, and we don’t
capture them in a variable.
The save
function doesn’t know this and will happily assemble its
result array. The array consists of three string values (six when also counting
attribute names). Setting up the result definitely requires costly
memory allocations and string copying.
We can avoid all this by passing an options parameter into save
, and setting
its silent
attribute to true
:
1 2 3 |
|
Now save()
will only return a boolean value, which is much quicker.
Transactions
Yet another alternative is use to wrap the operations in the loop into a
transaction. Transaction themselves won’t buy us much feature-wise, so why use
them? The reason is simple: if we do not use a transaction ourselves, each
save
operation will implicitly be executed in a transaction of its own.
For a loop with 100,000 operations, that will be 100K transactions!
So when we put all the operations into a single, now explicit transaction, we can save the overhead of 99,999 transaction begin and commit operations. Here’s how to do it:
1 2 3 4 5 6 7 8 9 10 |
|
Results
How far have we got with these minimal code adjustments?
I have put together a script that can be run in arangod. The script will run each version of the loop 10 times and time the execution. The minimum, maximum and average execution times are printed (in seconds, less is better). Note that the absolute times do not matter much here. Please have a look at the percentage column, which shows the execution time of each variant in comparison to the baseline.
Here’s an excerpt of the script’s output:
1 2 3 4 5 6 7 |
|
As can be seen, moving the loop-invariant accessor function call outside of the loop provided an almost 20% speedup (from 1.30 to 1.06 s). Using the silence option did also provide some, but not the same speedup. Using transactions reduced the execution time, and by putting all this together, a reduction of about 40 % was achieved.
Your mileage may vary. Please feel free to adjust the test script and run your own tests.