J@ArangoDB

{ "subject" : "ArangoDB", "tags": [ "multi-model", "nosql", "database" ] }

Bulk Document Lookups

ArangoDB 2.6 comes with a specialized API for bulk document lookups.

The new API allows fetching multiple documents from the server using a single request, making bulk document retrieval more efficient than when using one request per document to fetch.

The straight-forward implementation of a client application that needs to fetch several documents from an ArangoDB server looks like this:

fetching multiple documents from the server one by one
1
2
3
4
5
6
7
8
9
// list of document keys
var keys = [ "foo", "bar", "baz", ...];

// sequentially fetch all documents from the server
var results = [ ];
for (var i = 0; i < keys.length; ++i) {
  results.push(db.test.document(keys[i]));
}
// now all documents are contained in variable 'results'

This works fine but causes excessive HTTP communication between the client application and the server when many documents need to be fetched. In fact, the above code will issue as many HTTP requests as there are documents to fetch.

From the performance point of view, it would be much better to reduce the number of HTTP requests, and retrieve multiple documents from the server in one go, using a single request.

This is where the new document lookup function comes into play. Provided the documents keys are known, all the client application needs to do is to call the collection’s lookupByKeys method:

bulk method: fetching multiple documents at once
1
2
3
4
5
// list of document keys
var keys = [ "foo", "bar", "baz", ...];

var results = db.test.lookupByKeys(keys);
// now all documents are contained in variable 'results'

Following is a comparison of the execution times for the two different methds. All test runs were conducted in the same ArangoDB 2.6 instance. The tests were run from the ArangoShell. The ArangoShell and the ArangoDB server were located on the same physical host.

comparing single document requests and bulk requests
1
2
3
4
5
Number of keys     Single documents        Bulk
-----------------------------------------------
         1,000               0.24 s      0.04 s
        10,000               1.23 s      0.31 s
       100,000              10.89 s      2.13 s

As can be seen, the bulk method can provide a substantial speedup in case lots of documents need to be fetched by their keys at once. The actual speedups might be even higher when using a remote ArangoDB server instead of a localhost connection.

In 2.6 there is currently an ArangoShell implementation for bulk document lookups. Other drivers will follow.

Additionally, the server-side REST API method for bulk document lookups can be invoked directly via HTTP as follows:

invoking bulk document lookups via HTTP
1
2
3
4
curl                                                  \
  -X PUT                                              \
  http://127.0.0.1:8529/_api/simple/lookup-by-keys    \
  --data '{"collection":"test","keys":["foo","bar","baz"]}'

Restrictions: the bulk document API works only with document keys, not document ids. Additionally, it works on a single collection at a time and cannot be leveraged to fetch documents from multiple collections. Still, a client application can group document keys by collection beforehand and send one bulk request per involved collection. Finally, trying to fetch a document using a non-existing key will not produce an error with the bulk API. Using the one-by-one method, trying to fetch a non-existing document will throw an exception.