Of course it does not make much sense to optimize operations that are not called very often. The code changes I show will only be useful for server-side operations that are called very often, for example, from within loops or from batch processing actions.
Before starting to change any code, please make sure that the code is executed often and that it accounts for a significant part of the total execution time.
1 2 3
This code inserts 100,000 documents into a collection
test. Each document has
one attribute only. These numbers are arbitrary, but good enough for a demo.
What can we do to improve the runtime of the above code?
The for statement itself is not worth optimizing. It won’t matter much if we used
pre-increment or post-increment for the loop induction variable
i or if we
turned the for loop into a while loop. Any changes here might only save us a
few nanoseconds in total, but are likely to make the code more unreadable.
Let’s not do that!
Clearly, we should be looking at the
db.test.save() looks like a function call, and we learned that function are
expensive. In this case, we cannot avoid the function call to
save(), but we
can avoid another hidden function call. Yes,
db.test actually calls a function,
though it does not look like it does.
db object has auto-magic member attributes. The
db object will have a
member attribute for existing collection. The member will automatically vanish when
a collection gets dropped, and the member will rename itself when collections are
This magic is made possible by late-binding attributes and using accessor functions
for attribute accesses on the
db object: whenever the attributes of the
are queried, an accessor function (property query) is called internally to compile
them. Accessing a specific attribute of the
db object will also call an accessor
function (property get). This is exactly what happens in our case when we access
If this was too complicated, it may become more obvious if we modified the original code to this:
1 2 3
Now it should be obvious that accessing
test requires an attribute lookup on the
db object, and behind the scenes the same will happen if we had written
Let’s avoid the repeated call to the accessor function inside the loop! This can
easily be achieved by assigning
db.test to a variable once and forever outside
of the loop. This technique is called loop-invariant code motion, and it can be
applied in a lot of other situations, too:
1 2 3 4
(on a side note: you cannot assign
db.test.save to a variable and call it as a
Enjoying the silence
save operation is chatty. Every time it is called, it will return some meta
data from the just-inserted document, e.g.:
1 2 3 4 5
In our case, we’re not interested in these returned values, and we don’t
capture them in a variable.
save function doesn’t know this and will happily assemble its
result array. The array consists of three string values (six when also counting
attribute names). Setting up the result definitely requires costly
memory allocations and string copying.
We can avoid all this by passing an options parameter into
save, and setting
silent attribute to
1 2 3
save() will only return a boolean value, which is much quicker.
Yet another alternative is use to wrap the operations in the loop into a
transaction. Transaction themselves won’t buy us much feature-wise, so why use
them? The reason is simple: if we do not use a transaction ourselves, each
save operation will implicitly be executed in a transaction of its own.
For a loop with 100,000 operations, that will be 100K transactions!
So when we put all the operations into a single, now explicit transaction, we can save the overhead of 99,999 transaction begin and commit operations. Here’s how to do it:
1 2 3 4 5 6 7 8 9 10
How far have we got with these minimal code adjustments?
I have put together a script that can be run in arangod. The script will run each version of the loop 10 times and time the execution. The minimum, maximum and average execution times are printed (in seconds, less is better). Note that the absolute times do not matter much here. Please have a look at the percentage column, which shows the execution time of each variant in comparison to the baseline.
Here’s an excerpt of the script’s output:
1 2 3 4 5 6 7
As can be seen, moving the loop-invariant accessor function call outside of the loop provided an almost 20% speedup (from 1.30 to 1.06 s). Using the silence option did also provide some, but not the same speedup. Using transactions reduced the execution time, and by putting all this together, a reduction of about 40 % was achieved.
Your mileage may vary. Please feel free to adjust the test script and run your own tests.