J@ArangoDB

{ "subject" : "ArangoDB", "tags": [ "multi-model", "nosql", "database" ] }

The Great AQL Shootout: ArangoDB 2.5 vs 2.6

We are currently preparing ArangoDB 2.6 for release. A lot of work has been put into this release, and I really hope we can ship a first 2.6 release soon.

To keep you hanging on in the meantime, I put together some performance tests results from 2.6. The tests I ran compared AQL query execution times in 2.6 and 2.5.

The results look quite promising: 2.6 outperformed 2.5 for all tested queries, mostly by factors of 2 to 5. A few dedicated AQL features in the tests got boosted even more, resulting in query execution time reductions of 90 % and more. Finally, the tests also revealed a dedicated case for which 2.6 provides a several hundredfold speedup.

Also good news is that not a single of the test queries ran slower in 2.6 than in 2.5.

What was tested?

The tests execute several read-only AQL queries on datasets of different sizes and measure the query execution times. The tests were conducted in both ArangoDB 2.5 (2.5.4, the current stable version) and 2.6 (2.6.0-alpha2, the upcoming version), so the results of the two ArangoDB versions can be compared.

Though the tests do not cover every possible type of AQL operation, feature and function, they still do cover a wide range of features, e.g. lookups, joins, COLLECT operations, sorting, subqueries, and some AQL functions. Overall, the test suite contains 33 different cases.

All queries were run on datasets of three different sizes to validate that the results are relevant for datasets of various sizes. The dataset sizes are 10,000 documents, 100,000 documents, and 1,000,000 documents. Each query was repeated a few times so outliers in execution time can be identified.

There is full disclosure of the test methodology and the test script below, so anyone interested can repeat the tests locally and verify the results.

Test results

The combined test results from 2.5 and 2.6 can be found in this PDF file. There is also an ods version of the same file here. A description of the columns and test cases used in these files can be found further below.

For the detail-loving folks, here are the raw results for both versions in isolation: 2.5, 2.6.

The results show that ArangoDB 2.6 was consistently faster for all AQL queries included in the tests.

The queries that improved most in 2.6 over 2.5 include:

  • FILTER conditions: simple FILTER conditions as used in the tests are 3 to 5 times faster
  • simple joins using the primary index (_key attribute), hash index or skiplist index are 2 to 3.5 times faster
  • sorting on a string attribute is 2.5 to 3 times faster
  • extracting the _key or other top-level attributes from documents is 4 to 5 times faster
  • COLLECT statements: simple COLLECT statements like the ones in the tests are 7 to 15 times faster
  • looking up documents using IN lists with a substantial amount of values contained in the IN list is 250 to 700 times faster

The one thing that did not change much when comparing 2.6 with 2.5 is iterating over a collection and returning all its documents unmodified. The speedups observed for this type of query are between 18 and 25 %, which is the lowest speedup measured by the tests. Still 18 to 25 % seem okay as a free take-away.

Speedups were observed for all three test dataset sizes alike. In some cases, the speedups varied a bit with the dataset sizes, but it was still in the same ballpark for all three datasets. The conclusion is thus that the speedups did not depend much on the dataset sizes.

Reasons for speedups

There are several reasons why the 2.6 performance is better than in previous versions. The main reason is that we spent much time optimizing some of the crtical AQL code paths. Then we also worked on optimizations for specific features, which are used by some of the tested queries.

If you’re interested in the details, here they are:

Additionally, UTF-8 string comparisons were boosted by the upgrade from ICU 52 to ICU 54. The latter version contains a rewritten and much faster UTF-8-aware strcmp, which we heavily rely on.

Test methodology

Each query was run five times on each dataset, so execution time outliers can be identified. The results contain the minimum, maximum and average execution times for each query.

Queries were run in isolation on an otherwise idle server. The queries were all run inside the server, so there was no HTTP/network traffic involved for shipping query results (note: this was also vastly improved in 2.6 but this is not the subject of this post).

All tests were run on my local machine, which has 4 cores, 8 CPUs (though the number of CPUs will not matter for any of the tests), 12 GB of physical memory, a Linux 3.16 kernel and an Ubuntu 15 OS. All datasets fit into the main memory, so tests were not I/O-bound.

The ArangoDB versions tested were 2.5.4 and 2.6.0-alpha2. Both versions were hand-compiled with g++ 4.9.1 with options CC='gcc' CXX='g++' CFLAGS='-O3 -Wall' CXXFLAGS='-O3 -Wall'.

The complete test script, including the setup of the test data, is contained in this file. It can be run inside arangod by typing the following in the server console:

running the tests inside arangod
1
require("internal").load("/path/to/arango-25-26-shootout-script.js");

Note that this needs an arangod started with option --console. Also note that running the script will only test the current arangod instance, so the script needs to be run once in a 2.5 instance and once in 2.6.

Running the script will set up the test collections, run all queries on them (you will need some patience for this) and finally print a table like the following:

excerpt from test results
1
2
3
4
5
6
7
8
9
test name                     | collection  |    runs |     min (s) |     max (s) |     avg (s)
---------------------------------------------------------------------------------------------------
collect-number                | 10k         |       5 |      0.0760 |      0.1638 |      0.0943
collect-number                | 100k        |       5 |      0.8697 |      0.8966 |      0.8803
collect-number                | 1000k       |       5 |     10.4320 |     10.6597 |     10.5314
collect-string                | 10k         |       5 |      0.1211 |      0.1319 |      0.1250
collect-string                | 100k        |       5 |      1.5406 |      1.5974 |      1.5641
collect-string                | 1000k       |       5 |     19.0708 |     19.0966 |     19.0825
collect-count-number          | 10k         |       5 |      0.0763 |      0.0792 |      0.0778

These result columns have the following meanings:

  • test name: name of test
  • collection: name of collection. 10k is a collection with 10,000 documents, 100k contains 100,000 documents, and 1000k contains 1,000,000 documents.
  • runs: number of times the query was run
  • min (s): minimum query execution time (in seconds)
  • max (s): maximum query execution time (in seconds)
  • avg (s): average query execution time (in seconds)

Test data

The test datasets for the three collections are filled with artifical data. Test documents are created like this:

test document creation
1
2
3
4
5
6
7
8
9
10
11
collection.insert({
  _key: "test" + i,
  value1: i,
  value2: "test" + i,
  value3: i,
  value4: "test" + i,
  value5: i,
  value6: "test" + i,
  value7: i % g,
  value8: "test" + (i % g)
});

Each document has a _key attribute and 8 other attributes, value1 to value8.

value1, value3, value5 and value7 are numeric attributes, the other attributes contain string values. The attributes value1 to value6 contain unique values. The attributes value7 and value8 contain repeating values. They are used for COLLECT queries.

value1 and value2 are each indexed with a hash index. value3 and value4 are each indexed with a skiplist index. value5 to value8 are not indexed. This way queries can be run on the same values, but with different indexes and even without indexes.

Test cases

The test cases cover the following queries:

  • collect-number and collect-string: run COLLECT on a repeating attribute, which is either numeric or a string
  • collect-count-number and collect-count-string: ditto, but also calculate the group lengths using WITH COUNT INTO
  • subquery: run a single-document subquery for each document of the original collection
  • concat: for each document in the collection, concat the document _key attribute with another document attribute using CONCAT()
  • merge: for each document in the collection, merge the document with another object using MERGE()
  • keep: for each document in the collection, remove all but a few named attributes from it using KEEP()
  • unset: for each document in the collection, remove a few named attributes from it using UNSET()
  • min-number and min-string: return the minimum value of a specific attribute from all documents in the collection, which is either numeric or a string. This uses MIN()
  • max-number and max-string: ditto, but using MAX()
  • sort-number and sort-string: sort all documents in the collection by a non-indexed attribute, which is either numeric or a string
  • filter-number and filter-string: filter all documents in the collection using a non-indexed attribute, which is either numeric or a string
  • extract-doc: return all documents in the collection unmodified
  • extract-key: return the _key attribute of all documents in the collection
  • extract-number and extract-string: return an attribute from all documents in the collection, which is either numeric or a string
  • join-key: for each document in the collection, perform a join on the _key attribute on the collection itself (i.e. FOR c1 IN @@c FOR c2 IN @@c FILTER c1._key == c2._key RETURN c1)
  • join-id: ditto, but perform the join using the _id attribute
  • join-hash-number and join-hash-string: ditto, but join using a hash index on a numeric or string attribute
  • join-skiplist-number and join-skiplist-string: ditto, but join using a skiplist index on a numeric or string attribute
  • lookup-key, lookup-hash-number, lookup-hash-string, lookup-skiplist-number, lookup-skiplist-string: compile an IN-list of 10,000 lookup values and search these 10,000 documents in the collection using either the primary index (_key attribute), a hash index or a skiplist index. The latter two are tested on numeric and string attributes.

Further implementation details can be checked in the test script.