J@ArangoDB

{ "subject" : "ArangoDB", "tags": [ "multi-model", "nosql", "database" ] }

Using Multiple Indexes Per Collection

The query optimizer in ArangoDB 2.8 has been improved in terms of how it can make use of indexes. In previous versions of ArangoDB, the query optimizer could use only one index per collection used in an AQL query. When using a logical OR in a FILTER condition, the optimizer did not use any index for the collection in order to ensure the result is still correct.

This is much better in 2.8. Now the query optimizer can use multiple indexes on the same collection for FILTER conditions that are combined with a logical OR.

For all following queries, I have set up a collection named test, which has two isolated hash indexes on the attributes value1 and value2, and a skiplist index on attribute value3.

Let’s first try an AQL queries that uses a logical OR on two different attributes of the collection:

example query
1
2
3
FOR doc IN test
  FILTER doc.value1 == 11 || doc.value2 == 19 
  RETURN doc

The execution plan for this query in 2.7 reveals that query will perform a full collection scan and cannot use indexes because of the logical OR on two different attributes:

2.7 query execution plan
1
2
3
4
5
6
7
8
9
10
11
12
13
Execution plan:
 Id   NodeType                  Est.   Comment
  1   SingletonNode                1   * ROOT
  2   EnumerateCollectionNode      0     - FOR doc IN test   /* full collection scan */
  3   CalculationNode              0       - LET #1 = doc.`value1` == 11 || doc.`value2` == 19
  4   FilterNode                   0       - FILTER #1
  5   ReturnNode                   0       - RETURN doc

Indexes used:
 none

Optimization rules applied:
 none

Running the same query in 2.8 / devel will produce a much better execution plan:

2.8 query execution plan
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Execution plan:
 Id   NodeType          Est.   Comment
  1   SingletonNode        1   * ROOT
  6   IndexNode            2     - FOR doc IN test   /* hash index scan, hash index scan */
  3   CalculationNode      2       - LET #1 = doc.`value1` == 11 || doc.`value2` == 19  
  4   FilterNode           2       - FILTER #1
  5   ReturnNode           2       - RETURN doc

Indexes used:
 By   Type   Collection   Unique   Sparse   Selectivity   Fields         Ranges
  6   hash   test         false    false       100.00 %   [ `value1` ]   doc.`value1` == 11
  6   hash   test         false    false       100.00 %   [ `value2` ]   doc.`value2` == 19

Optimization rules applied:
 Id   RuleName
  1   use-indexes

Multiple indexes will also be used if different index types are accessed, or for non-equality filter conditions. For example, the following query will make use of the two hash indexes and also the skiplist index:

example query
1
2
3
FOR doc IN test 
  FILTER doc.value1 == 11 || doc.value2 == 19 || doc.value3 > 42 
  RETURN doc

Here is its execution plan from 2.8:

2.8 query execution plan
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Execution plan:
 Id   NodeType          Est.   Comment
  1   SingletonNode        1   * ROOT
  6   IndexNode            3     - FOR doc IN test   /* hash index scan, hash index scan, skiplist index scan */
  3   CalculationNode      3       - LET #1 = doc.`value1` == 11 || doc.`value2` == 19 || doc.`value3` > 42 
  4   FilterNode           3       - FILTER #1
  5   ReturnNode           3       - RETURN doc

Indexes used:
 By   Type       Collection   Unique   Sparse   Selectivity   Fields         Ranges
  6   hash       test         false    false       100.00 %   [ `value1` ]   doc.`value1` == 11
  6   hash       test         false    false       100.00 %   [ `value2` ]   doc.`value2` == 19
  6   skiplist   test         false    false            n/a   [ `value3` ]   doc.`value3` > 42

Optimization rules applied:
 Id   RuleName
  1   use-indexes

For comparison, here is the non-optimized plan from 2.7 for the same query:

2.7 query execution plan
1
2
3
4
5
6
7
8
9
10
11
12
13
Execution plan:
 Id   NodeType                  Est.   Comment
  1   SingletonNode                1   * ROOT
  2   EnumerateCollectionNode      0     - FOR doc IN test   /* full collection scan */
  3   CalculationNode              0       - LET #1 = doc.`value1` == 11 || doc.`value2` == 19 || doc.`value3` > 42
  4   FilterNode                   0       - FILTER #1
  5   ReturnNode                   0       - RETURN doc

Indexes used:
 none

Optimization rules applied:
 none

Still the query optimizer will not be able to use any indexes on a collection when there are multiple FILTER conditions combined with logical OR and at least one of them is not satisfisable by an index of the collection. In this case it has no other choice but to do a full collection scan.

For queries that combine multiple FILTER conditions with a logical AND, the optimizer will still try to pick the most selective index for the query and use it for the collection.