A quick post that showcases the new optimizer rule
move-calculations-down that was added
in ArangoDB 2.5 (current
Consider the following simple query:
1 2 3 4
If no indexes are present in collection
test, the execution plan will look
like this in 2.4:
The plan can be improved a bit. While there are no indexes to exploit, the calculation
calculated can be pushed beyond the execution of the
FILTER. This will be very
beneficial if the calculation is expensive and the
FILTER can prune a lot of documents.
In 2.5, there is an optimizer rule
move-calculations-down that will do this. It will
move all eligible calculations as far down in the plan as possible. A calculation obviously can
be moved down a step only if the successor step does not depend on its result.
Additionally, a calculation will not be moved beyond a
COLLECT operation, because
COLLECT changes which variables are visible in a scope. Finally, a calculation will not
be moved down inside a
FOR loop, in order to avoid repeated calculations.
In the example query, the step following the
LET calculation was a
The filter condition does not depend on the result of
calcuated, so the calculation is
eligible for being moved down.
The resulting execution plan will look like this in 2.5:
As we can see from the changed id sequence on the left, the calculation was moved down.
What does this buy us? For the simple query above, with the
test collection containing
100.000 documents with
value ranging from 0 to 99,999, the results are as follows:
- if the filter leaves only 0.1 % of the documents pass, execution time goes down from 0.87 seconds to 0.17 seconds thanks to the rule
- if the filter lets 1 % of the documents pass, execution time is 0.21 seconds with the rule, and 0.91 without
- if the filter lets 10 % of the documents pass, execution time is 0.25 s, vs. 0.91 seconds without.
- if the filter lets all documents pss, there is no difference in execution time
This is quite a nice speedup, especially when taking into account how simple the optimizer rule is. The effects may be even greater for queries that contain multiple calculations that can be pushed beyond filters, or for more expensive calculations.
Of course the best solution for the above query would be to use a skiplist index on
value, but that’s a different story. The optimizer rule shown here is orthogonal to
using indexes, so queries already using indexes might still benefit from the rule if they
contain additional calculations or further filters which cannot be satifisfied by indexes.