J@ArangoDB

{ "subject" : "ArangoDB", "tags": [ "multi-model", "nosql", "database" ] }

AQL Improvements in 2.5

Contained in 2.5 are some small but useful AQL language improvements plus several AQL optimizer improvements.

We are working on further AQL improvements for 2.5, but work is still ongoing. This post summarizes the improvements that are already completed and will be shipped with the initial ArangoDB 2.5 release.

Language improvements

Dynamic attribute names

Often the need arises to dynamically name object attributes in return values. In AQL this was not directly possible so far, though there were some workarounds available to achieve about the same result. This recipe summarizes the options that are available to pre-ArangoDB 2.5 users.

With ArangoDB 2.5, dynamic attribute names can be constructed much more easily and flexibly. Object attribute names in ArangoDB 2.5 can be specified using static string literals, bind parameters, and dynamic expressions.

Dynamic expressions are most interesting, and to disambiguate them from other regular string literal attribute names, dynamic attribute names need to be enclosed in square brackets ([ and ]). I have written about that before in this blog.

Here is an example query that uses the new syntax:

example query using dynamic attribute names
1
2
FOR i IN [ 17, 23, 42, 83 ]
  RETURN { [ CONCAT('value-of-', i, ' * ', i) ] : i * i }

This will produce:

query result
1
2
3
4
5
6
7
8
9
10
11
12
13
14
[
  {
    "value-of-17 * 17" : 289
  },
  {
    "value-of-23 * 23" : 529
  },
  {
    "value-of-42 * 42" : 1764
  },
  {
    "value-of-83 * 83" : 6889
  }
]

Functions added

The following AQL functions have been added in 2.5:

  • MD5(value): produces the MD5 hash of value
  • SHA1(value): produces the SHA1 hash of value
  • RANDOM_TOKEN(length): produces a pseudo-random string of the specified length. Such strings can be used for id or token generation. Tokens consist only of letters (lower and upper case) plus digits, so they are also URL-safe

Optimizer improvements

Optimizer rules

The following AQL optimizer rules have been added in ArangoDB 2.5:

  • propagate-constant-attributes

    This rule will look inside FILTER conditions for constant value equality comparisons, and insert the constant values in other places in FILTERs. For example, the rule will insert 42 instead of i.value in the second FILTER of the following query:

    FOR i IN c1 
      FOR j IN c2 
        FILTER i.value == 42 
        FILTER j.value == i.value 
        RETURN 1
    
  • move-calculations-down

    This rule moves calculations down in the execution plan as far as possible. The intention is to move calculations beyond filters, in order to avoid calculations and computations for documents that will be filtered away anyway.

    If a query contains a lot of computations and a lot of documents will be skipped because of filters, this rule might provide a big benefit.

    A more detailed example is provided in this post.

The already existing optimizer rule use-index-for-sort was also improved in the following way:

  • the rule can now remove SORTs also in case a non-sorted index (i.e. a hash index) is used for an equality lookup and all sort attributes are covered by the index.

  • the rule can also remove SORTs in case the sort critieria excludes the left-most index attributes, but the left-most index attributes are used in a FILTER for equality-only lookups.

    Here is an example that will use an existing skiplist index on [ value1, value2 ] for sorting, removing the extra SORT:

    FOR doc IN collection 
      FILTER doc.value1 == 1 
      SORT doc.value2 
      RETURN doc
    

Index usage

The AQL optimizer now supports sparse indexes, a feature added in 2.5.

It will use them automatically in queries when appropriate and when safe. Sparse indexes do exclude certain documents purposely, so the optimizer always has to figure out whether it can use a sparse index to satisfy a given FILTER condition.

The optimizer will also take into account index selectivity estimates when there are multiple index candidates.

Estimates

The optimizer estimates for the number of documents to be returned by a query or a subquery are more accurate now for several types of queries. For example, if the optimizer can use a primary key, an edge index, or a hash index in a given query part, it will use the index selectivity estimates for calculating the number of return documents.

These estimates will be a lot more accurate than the previoulsy hard-coded filtering factors, and can lead to better optimizer decisions and reporting (because estimates are returned in explain results, too).

Memory savings

Finally, the optimizer will now detect if the data-modification part in a data-modification query can be executed in lockstep with the data-retrieval part of the same query. Previously, a data-modification query always executed its data-retrieval part first, and then executed its data-modification part. This could have resulted in big intermediate result sets which to retrieval part constructed in order to pass them to the modification part of the query.

Here’s an example query:

data-modification query
1
2
FOR doc IN test
  INSERT doc INTO backup

In the above query, the FOR loop is the retrieval part, and the INSERT is the modification part. The optimizer in 2.5 will check if the two parts of the query are independent, and if it turns out they are, will execute them in lockstep instead of sequentially.

The execution in lockstep is not necessarily faster than sequential execution, but it can save lots of memory if the data-retrieval part constructed big intermediate result sets.

Miscellaneous changes

The AQL query execution statistics now also provide an attribute filtered. Its value indicates how many documents were filtered by FilterNodes in the AQL query. This can be used as an indicator for whether indexes should be added, and for how effective indexes are used for filtering.