J@ArangoDB

{ "subject" : "ArangoDB", "tags": [ "multi-model", "nosql", "database" ] }

Small Things in 2.8: POW

ArangoDB 2.8 now provides a dedicated AQL function for exponentiation. This will save users a lot of trouble in case exponentiation is needed inside an AQL query, which up to 2.7 required writing and registering an AQL user-defined function.

With 2.8 it becomes as simple as RETURN POW(2, 16) to raise 2 to the power of 16 from inside AQL.

Small Things in 2.8: Collection Type for Arangoimp

When trying to import data into ArangoDB from a JSON or CSV file using the arangoimp binary, there is always the chance that the target collection does not yet exist.

In order to create a missing target collection arangoimp for long has provided the option --create-collection true:

importing, optionally creating a missing collection
1
2
3
4
arangoimp                       \
  --file users.json             \
  --collection users            \
  --create-collection true

However there hasn’t been a way of specifying the type for the target collection, so the new collection was always created as document collection.

To import data into an edge collection, the target collection needed to be created by another means, e.g. by using the ArangoShell. It would have been more handy if arangoimp were able to create edge collections too.

2.8 finally adds that feature, and it’s simple to use: to create an edge collection if the target collection does not exist, append the --create-collection-type edge option when invoking arangoimp:

importing, optionally creating a missing edge collection
1
2
3
4
5
arangoimp                       \
  --file users.json             \
  --collection users            \
  --create-collection true      \
  --create-collection-type edge

AQL Optimizer Improvements for 2.8

With the 2.8 beta phase coming to an end it’s time to shed some light on the improvements in the 2.8 AQL optimizer. This blog post summarizes a few of them, focusing on the query optimizer. There’ll be a follow-up post that will explain dedicated new AQL features soon.

AQL Function Speedups

While working on the upcoming ArangoDB 2.8, we have reimplemented some AQL functions in C++ for improved performance. AQL queries using these functions may benefit from using the new implementation of the function.

The following list shows the AQL functions for which a C++ implementation has been added in 2.8. The other C++-based AQL function implementations added since ArangoDB 2.5 are also still available. Here’s the list of functions added in 2.8:

Using Multiple Indexes Per Collection

The query optimizer in ArangoDB 2.8 has been improved in terms of how it can make use of indexes. In previous versions of ArangoDB, the query optimizer could use only one index per collection used in an AQL query. When using a logical OR in a FILTER condition, the optimizer did not use any index for the collection in order to ensure the result is still correct.

This is much better in 2.8. Now the query optimizer can use multiple indexes on the same collection for FILTER conditions that are combined with a logical OR.

Index Speedups in 2.8

The upcoming 2.8 version of ArangoDB will provide several improvements in the area of index usage and query optimization.

First of all, hash and skiplist indexes can now index individual array values. A dedicated post on this will follow shortly. Second, the query optimizer can make use multiple indexes per collection for queries with OR-combined filter conditions. This again is a subject for another post. Third, there have been some speed improvements due to changes in the general index handling code. This is what this post is about.

Improved Deadlock Detection

The upcoming ArangoDB version 2.8 (currently in devel) will provide a much better deadlock detection mechanism than its predecessors.

The new deadlock detection mechanism will kick in automatically when it detects operations that are mutually waiting for each other. In case it finds such deadlock, it will abort one of the operations so that the others can continue and overall progress can be made.

Using Bind Parameters in the AQL Editor

The AQL editor in the web interface is useful for running ad hoc AQL queries and trying things out. It provides a feature to explain the query and inspect its execution plan. This can be used to check if the query uses indexes, and which.

So far the AQL editor only supported using query string literals, but it lacked support for bind parameters. Queries issued by application code however often will use bind parameters for security reasons. Often enough this prevented copying & pasting queries from the application code into the AQL editor and vice versa without making manual adjustments.

C++ Constructors and Memory Leaks

Preventing leaks in throwing constructors

The easiest way to prevent memory leaks is to create all objects on the stack and not using dynamic memory at all. However, often this is not possible, for example because stack size is limited or objects need to outlive the caller’s scope.

Another way to prevent memory leaks and leaks of other resources is obviously to employ the RAII pattern. How can it be used safely and easily in practice, so memory leaks can be avoided?

This post will start with a few seemingly working but subtly ill-formed techniques that a few common pitfalls. Later on it will provide a few very simple solutions for getting it right.

ArangoDB-PHP Driver Improvements

While preparing the release of ArangoDB 2.7, some improvements were made for the PHP driver for ArangoDB.

The 2.7 version of the PHP driver now supports the AQL query results cache. The cache can be turned on or off globally, or be set to demand mode. The demand mode will allow controlling caching on a per-AQL-query basis.

Additionally, the HTTP transport layer in the PHP driver was improved. Some internal string handling methods were optimized so that the transport part becomes cheaper. All driver operations that communicate with the ArangoDB server will benefit from this.

For a demonstration of the improvements, here is a script that creates 100,000 documents in a local ArangoDB database via the PHP driver. As we’re interested in assessing the HTTP layer improvements, the script intentionally issues 100,000 HTTP requests instead of using the specialized import method provided by the driver.

The script code can be found here.

The baseline for the improvments is the (non-optimized) 2.6 version of the PHP driver. Here are the results for issuing 100,000 requests with the 2.6 driver (script was run twice to see if there are variations in execution time):

execution times with 2.6 driver
1
2
3
4
5
creating 100000 documents
creating documents took 55.144556999207 s

creating 100000 documents
creating documents took 54.476955890656 s

Running it with the 2.7 version of the PHP driver now shows the improvements. Execution time for the same script goes down from 54 seconds to 42 seconds:

execution times with 2.7 driver
1
2
3
4
5
creating 100000 documents
creating documents took 42.886090040207 s

creating 100000 documents
creating documents took 42.578990936279 s

The PHP version used here was:

PHP version details
1
2
3
4
PHP 5.5.12-2ubuntu4.6 (cli) (built: Jul  2 2015 15:27:14) 
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.5.0, Copyright (c) 1998-2014 Zend Technologies
    with Zend OPcache v7.0.4-dev, Copyright (c) 1999-2014, by Zend Technologies

Following are the results from a different machine, this time using PHP 5.6:

execution times with 2.6 driver
1
2
3
4
5
creating 100000 documents
creating documents took 48.394731044769 s

creating 100000 documents
creating documents took 47.618598937988 s
execution times with 2.7 driver
1
2
3
4
5
creating 100000 documents
creating documents took 40.535583972931 s

creating 100000 documents
creating documents took 40.041265010834 s

The PHP version details for this machine were:

PHP version details
1
2
3
4
PHP 5.6.4-4ubuntu6.2 (cli) (built: Jul  2 2015 15:29:28) 
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.6.0, Copyright (c) 1998-2014 Zend Technologies
    with Zend OPcache v7.0.4-dev, Copyright (c) 1999-2014, by Zend Technologies

The actual improvements depend on many factors, so your exact mileage may vary. The improvements may not be noticable for applications that issue only a few requests with the driver, but they will be significant when performing lots of requests, as in the above examples.