J@ArangoDB

{ "subject" : "ArangoDB", "tags": [ "multi-model", "nosql", "database" ] }

AQL Functions Improvements

Waiting for a git pull to complete over an 8 KiB/s internet connection is boring. So I thought I’d rather use the idle time and quickly write about some performance improvements for certain AQL functions that were recently completed and that will become available with ArangoDB 2.6.

The improvements affect the following AQL functions:

  • UNSET(): remove specified attributes from an object/document
  • KEEP(): keep only specified attributes of an object/document
  • MERGE(): merge the attributes of multiple objects/documents

This blog post shows a few example queries that will benefit from 50 to more than 60 % reductions in query execution times due to the changes done to these functions.

When to expect benefits

Reductions in execution time can be expected for AQL queries invoking one of the above AQL functions many times, and if the AQL function is used in a so-called simple calculation. Whether or not a calculation is considered simple is shown in the detailed JSON output of an explain().

Queries will not benefit if they invoke the AQL functions only a few times or when the function call is contained in an expression that is executed using the non-simple code path.

Example queries

Following are a few example queries for the three AQL functions, showing the reductions in execution times. They all use the simple code path so the benefits can be reaped.

For comparing the execution times between 2.5 and 2.6, I have prepared a simple test setup. Here is a test function that will create a collection named test and populate it with a configurable amount documents. It will then run an AQL query that will update each document in the collection, using one of the named AQL functions. The function will return the execution time for the AQL query, excluding the collection setup time:

test function
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
var run = function (n, query) {
  var time = require("internal").time;
  var db = require("org/arangodb").db;

  // drop and re-create test collection 
  db._drop("test");
  var c = db._create("test");

  // insert n documents
  for (var i = 0; i < n; ++i) {
    c.insert({ value1: i, value2: i, value3: 'foobar' + i });
  }

  // flush write-ahead log and wait a few seconds before running query
  require("internal").wal.flush();
  require("internal").wait(5);

  // run query 
  var s = time();
  db._query(query);
  return time() - s;
};

UNSET()

Let’s start with the UNSET() AQL function. Its purpose is to remove or multiple attributes from an object/document. Here is an example AQL query that removes attribute value2 from each document in the test collection:

invocation of UNSET()
1
run(n, "FOR t IN test LET modified = UNSET(t, 'value2') REPLACE t WITH modified IN test");

The execution times (in seconds) for different n values in ArangoDB 2.5 and 2.6 are:

execution times of UNSET() query in 2.5 and 2.6
1
2
3
4
5
value of n       time 2.5       time 2.6       speedup
------------------------------------------------------
   100,000         3.28 s         1.13 s          65 %
   500,000        16.93 s         5.38 s          68 %
 1,000,000        32.60 s        11.40 s          65 %

KEEP()

The purpose of KEEP() is to remove all attributes from an object/document but the specified ones. Here’s an example query that uses KEEP() to remove all attributes from the documents in the test collectionn but attribute value2:

invocation of KEEP()
1
run(n, "FOR t IN test LET modified = KEEP(t, 'value2') REPLACE t WITH modified IN test");

The execution times (in seconds) for different n values in ArangoDB 2.5 and 2.6 are:

execution times of KEEP() query in 2.5 and 2.6
1
2
3
4
5
value of n       time 2.5       time 2.6       speedup
------------------------------------------------------
   100,000         1.98 s         0.87 s          56 %
   500,000         9.34 s         4.09 s          56 %
 1,000,000        18.86 s         8.23 s          56 %

MERGE()

Finally, the MERGE() function can be used to merge multiple objects/documents in a single one. The following query will add an attribute value4 to all documents in collection test:

invocation of MERGE()
1
run(n, "FOR t IN test LET modified = MERGE(t, { value4 : 1 }) REPLACE t WITH modified IN test");

The execution times (in seconds) for different n values in ArangoDB 2.5 and 2.6 are:

execution times of MERGE() query in 2.5 and 2.6
1
2
3
4
5
value of n       time 2.5       time 2.6       speedup
------------------------------------------------------
   100,000         3.93 s         1.22 s          68 %
   500,000        19.17 s         5.91 s          69 %
 1,000,000        38.27 s        12.33 s          67 %