We have worked on many AQL optimizations for ArangoDB 2.6.
As a side effect of one of these optimizations, some cases involving the handling of large IN-lists have become much faster than before. Large IN-lists are normally used when comparing attribute or index values against some big array of lookup values or keys provided by the application.
Let’s quickly create and populate a collection named keys
so that we can use some
IN-list queries on it later on:
1 2 3 4 5 6 |
|
And here is a query to all find documents with one of the provided keys test0
to test999
.
The IN-list here contains 1,000 values:
1 2 3 4 5 6 |
|
When invoked from the ArangoShell, this takes around 0.6 seconds to complete with ArangoDB 2.5.
Increasing the length of the IN-list from 1,000 to 5,000 values makes this run in around 15 seconds. With an IN-list of 10,000 values, this already takes more than 60 seconds to complete in 2.5.
Obviously longer IN-lists weren’t handled terribly efficiently in 2.5, and should be avoided there if possible.
I am glad this has been fixed in 2.6. Following is a comparison of the above query for different IN-list sizes, run on both ArangoDB 2.5 and 2.6.
1 2 3 4 5 6 7 |
|
Looks like 2.6 handles longer IN-lists way better than 2.5! The above figures suggest that execution times now scale about linearly with the number of IN-list values. This also leads to reductions in query execution times of 90 % and more percent.
Please note that longer IN-lists will still make a the query run longer than when using shorter IN-lists. This is expected because longer IN-lists require more comparisons to be made and will lead (in the above example) to more documents being returned.