Contained in 2.5 are some small but useful AQL language improvements plus several AQL optimizer improvements.
We are working on further AQL improvements for 2.5, but work is still ongoing. This post summarizes the improvements that are already completed and will be shipped with the initial ArangoDB 2.5 release.
Dynamic attribute names
Often the need arises to dynamically name object attributes in return values. In AQL this was not directly possible so far, though there were some workarounds available to achieve about the same result. This recipe summarizes the options that are available to pre-ArangoDB 2.5 users.
With ArangoDB 2.5, dynamic attribute names can be constructed much more easily and flexibly. Object attribute names in ArangoDB 2.5 can be specified using static string literals, bind parameters, and dynamic expressions.
Dynamic expressions are most interesting, and to disambiguate them from other regular string literal attribute
names, dynamic attribute names need to be enclosed in square brackets (
]). I have written about
that before in this blog.
Here is an example query that uses the new syntax:
This will produce:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
The following AQL functions have been added in 2.5:
MD5(value): produces the MD5 hash of
SHA1(value): produces the SHA1 hash of
RANDOM_TOKEN(length): produces a pseudo-random string of the specified length. Such strings can be used for id or token generation. Tokens consist only of letters (lower and upper case) plus digits, so they are also URL-safe
The following AQL optimizer rules have been added in ArangoDB 2.5:
This rule will look inside
FILTERconditions for constant value equality comparisons, and insert the constant values in other places in
FILTERs. For example, the rule will insert
i.valuein the second
FILTERof the following query:
FOR i IN c1 FOR j IN c2 FILTER i.value == 42 FILTER j.value == i.value RETURN 1
This rule moves calculations down in the execution plan as far as possible. The intention is to move calculations beyond filters, in order to avoid calculations and computations for documents that will be filtered away anyway.
If a query contains a lot of computations and a lot of documents will be skipped because of filters, this rule might provide a big benefit.
A more detailed example is provided in this post.
The already existing optimizer rule
use-index-for-sort was also improved in the following way:
the rule can now remove
SORTs also in case a non-sorted index (i.e. a hash index) is used for an equality lookup and all sort attributes are covered by the index.
the rule can also remove
SORTs in case the sort critieria excludes the left-most index attributes, but the left-most index attributes are used in a
FILTERfor equality-only lookups.
Here is an example that will use an existing skiplist index on [
value2] for sorting, removing the extra
FOR doc IN collection FILTER doc.value1 == 1 SORT doc.value2 RETURN doc
The AQL optimizer now supports sparse indexes, a feature added in 2.5.
It will use them automatically in queries when appropriate and when safe. Sparse indexes do exclude certain
documents purposely, so the optimizer always has to figure out whether it can use a sparse index to satisfy
The optimizer will also take into account index selectivity estimates when there are multiple index candidates.
The optimizer estimates for the number of documents to be returned by a query or a subquery are more accurate now for several types of queries. For example, if the optimizer can use a primary key, an edge index, or a hash index in a given query part, it will use the index selectivity estimates for calculating the number of return documents.
These estimates will be a lot more accurate than the previoulsy hard-coded filtering factors, and can lead to
better optimizer decisions and reporting (because estimates are returned in
explain results, too).
Finally, the optimizer will now detect if the data-modification part in a data-modification query can be executed in lockstep with the data-retrieval part of the same query. Previously, a data-modification query always executed its data-retrieval part first, and then executed its data-modification part. This could have resulted in big intermediate result sets which to retrieval part constructed in order to pass them to the modification part of the query.
Here’s an example query:
In the above query, the
FOR loop is the retrieval part, and the
INSERT is the modification part.
The optimizer in 2.5 will check if the two parts of the query are independent, and if it turns out they are,
will execute them in lockstep instead of sequentially.
The execution in lockstep is not necessarily faster than sequential execution, but it can save lots of memory if the data-retrieval part constructed big intermediate result sets.
The AQL query execution statistics now also provide an attribute
filtered. Its value indicates how many
documents were filtered by
FilterNodes in the AQL query. This can be used as an indicator for whether
indexes should be added, and for how effective indexes are used for filtering.