The query optimizer in ArangoDB 2.8 has been improved in terms of how it can
make use of indexes. In previous versions of ArangoDB, the query optimizer could
use only one index per collection used in an AQL query. When using a logical OR
in a FILTER condition, the optimizer did not use any index for the collection in
order to ensure the result is still correct.
This is much better in 2.8. Now the query optimizer can use multiple indexes on
the same collection for FILTER conditions that are combined with a logical OR.
For all following queries, I have set up a collection named test, which has
two isolated hash indexes on the attributes value1 and value2, and a skiplist
index on attribute value3.
Let’s first try an AQL queries that uses a logical OR on two different attributes of the
collection:
example query
123
FOR doc IN test
FILTER doc.value1 == 11 || doc.value2 == 19
RETURN doc
The execution plan for this query in 2.7 reveals that query will perform a full
collection scan and cannot use indexes because of the logical OR on two different
attributes:
2.7 query execution plan
12345678910111213
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 EnumerateCollectionNode 0 - FOR doc IN test /* full collection scan */
3 CalculationNode 0 - LET #1 = doc.`value1` == 11 || doc.`value2` == 19
4 FilterNode 0 - FILTER #1
5 ReturnNode 0 - RETURN doc
Indexes used:
none
Optimization rules applied:
none
Running the same query in 2.8 / devel will produce a much better execution plan:
2.8 query execution plan
12345678910111213141516
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
6 IndexNode 2 - FOR doc IN test /* hash index scan, hash index scan */
3 CalculationNode 2 - LET #1 = doc.`value1` == 11 || doc.`value2` == 19
4 FilterNode 2 - FILTER #1
5 ReturnNode 2 - RETURN doc
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
6 hash test false false 100.00 % [ `value1` ] doc.`value1` == 11
6 hash test false false 100.00 % [ `value2` ] doc.`value2` == 19
Optimization rules applied:
Id RuleName
1 use-indexes
Multiple indexes will also be used if different index types are accessed, or for non-equality
filter conditions. For example, the following query will make use of the two hash indexes
and also the skiplist index:
example query
123
FOR doc IN test
FILTER doc.value1 == 11 || doc.value2 == 19 || doc.value3 > 42
RETURN doc
Here is its execution plan from 2.8:
2.8 query execution plan
1234567891011121314151617
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
6 IndexNode 3 - FOR doc IN test /* hash index scan, hash index scan, skiplist index scan */
3 CalculationNode 3 - LET #1 = doc.`value1` == 11 || doc.`value2` == 19 || doc.`value3` > 42
4 FilterNode 3 - FILTER #1
5 ReturnNode 3 - RETURN doc
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
6 hash test false false 100.00 % [ `value1` ] doc.`value1` == 11
6 hash test false false 100.00 % [ `value2` ] doc.`value2` == 19
6 skiplist test false false n/a [ `value3` ] doc.`value3` > 42
Optimization rules applied:
Id RuleName
1 use-indexes
For comparison, here is the non-optimized plan from 2.7 for the same query:
2.7 query execution plan
12345678910111213
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 EnumerateCollectionNode 0 - FOR doc IN test /* full collection scan */
3 CalculationNode 0 - LET #1 = doc.`value1` == 11 || doc.`value2` == 19 || doc.`value3` > 42
4 FilterNode 0 - FILTER #1
5 ReturnNode 0 - RETURN doc
Indexes used:
none
Optimization rules applied:
none
Still the query optimizer will not be able to use any indexes on a collection when
there are multiple FILTER conditions combined with logical OR and at least one of them
is not satisfisable by an index of the collection. In this case it has no other choice
but to do a full collection scan.
For queries that combine multiple FILTER conditions with a logical AND, the optimizer
will still try to pick the most selective index for the query and use it for the collection.