The issue was opened around 10 months ago when ArangoDB 2.2 was around. We improved AQL performance somewhat since then, so I was eager to see how the query would perform in ArangoDB 2.6, especially when comparing it to 2.2.
For reproduction I quickly put together some example data to run the query on:
1 2 3 4 5 6
This produces a collection named
test with 4 million documents. Each document has a numeric
attribute, which in total has 100 unique values. I remembered from a conversation with the guy that
opened the issue that the number of distinct values was 100 or even slightly lower. I didn’t bother
to create an index on the
value attribute, which might have sped up the query.
With data available, it was time to run the query and measure its execution time:
1 2 3 4
Running this in 2.2.7 took 3 minutes and 18 seconds before bursting with the following error message:
1 2 3 4
As a consequence, running the query in 2.3 (2.3.5) worked fine. Execution took around 28 seconds. The same was true for 2.4 (2.4.8) and 2.5 (2.5.5).
Finally, running the query in 2.6.0 completed in just 3.2 seconds.
The reasons for the speedup are the optimizations done for
COLLECTing with a hash table, the
Return value optimization for AQL and some
minor optimizations within AQL that didn’t get a pretty working title.
Looks like in sum all the optimizations put into 2.6 really pay out.