While paging through the issues in the ArangoDB issue tracker
I came across issue #987, titled
Trying to get distinct document attribute values from a large collection fails.
The issue was opened around 10 months ago when ArangoDB 2.2 was around. We improved AQL performance
somewhat since then, so I was eager to see how the query would perform in ArangoDB 2.6, especially
when comparing it to 2.2.
We are currently preparing ArangoDB 2.6 for release. A lot of work has been put into this release,
and I really hope we can ship a first 2.6 release soon.
To keep you hanging on in the meantime, I put together some performance tests results from 2.6.
The tests I ran compared AQL query execution times in 2.6 and 2.5.
The results look quite promising: 2.6 outperformed 2.5 for all tested queries, mostly by
factors of 2 to 5. A few dedicated AQL features in the tests got boosted even more, resulting in
query execution time reductions of 90 % and more.
Finally, the tests also revealed a dedicated case for which 2.6 provides a several hundredfold speedup.
Also good news is that not a single of the test queries ran slower in 2.6 than in 2.5.
A while ago our continuous integration builds on TravisCI
began to fail seemingly randomly because the build worker was killed without
an apparent reason. Obviously the build process reached some resource limits
though we couldn’t find any documented limit that the build obviously violated.
Some builds still succeeded without issues, but those builds that were killed
had one thing in common: they were all stuck waiting the linker to finish.
The default linker used on TravisCI is GNU ld. After some research, it turned
out that replacing GNU ld with GNU gold not only made the linking much
faster, but also less resource-intensive. Linking ArangoDB on my local machine
is almost twice as fast with gold as with ld. Even better, after reconfiguring
our TravisCI builds to also use gold, our builds weren’t killed anymore by
TravisCI’s build scheduling system.
To make TravisCI use gold instead of ld, add the following to your project’s
.travis.yml in the install section (so it gets execute before the actual build
steps):
The script downloads and installs gold and creates a tiny wrapper script in a
file named ld in the user’s home directory. The wrapper simply calls gold
with all the arguments passed to the wrapper. Finally, the script modifies the
environments CFLAGS and CXXFLAGS by setting the -B parameter to the
wrapper script’s directory.
-B is the option for the compiler’s search path. The compiler (g++) at least
will look in this path for any helper tools it invokes. As we have a file named
ld in this directory, g++ will use our wrapper script instead of the original
ld binary. This way we can keep the original version of ld in /usr/bin,
and only override it using environment variables. This is also helpful in
other contexts, e.g. when ld shall remain as the system’s default linker but
goldshall only be used for linking a few selected components.
ArangoDB 2.6 comes with a specialized API for bulk document lookups.
The new API allows fetching multiple documents from the server using a single
request, making bulk document retrieval more efficient than when using
one request per document to fetch.
We have worked on many AQL optimizations for ArangoDB 2.6.
As a side effect of one of these optimizations, some cases involving the handling
of large IN-lists have become much faster than before. Large IN-lists are normally
used when comparing attribute or index values against some big array of lookup values
or keys provided by the application.
This post is about improvements for the fulltext index in ArangoDB 2.6. The improvements
address the problem that non-string attributes were ignored when fulltext-indexing.
Effectively this prevented string values inside arrays or objects from being indexed. Though this
behavior was documented, it was limited the usefulness of the fulltext index much. Several
users requested the fulltext index to be able to index arrays and object attributes, too.
Finally this has been accomplished, so the fulltext index in 2.6 supports indexing arrays
and objects!
This is another post demonstrating some of the AQL query performance improvements
that can be expected in ArangoDB 2.6. Specifically, this post is about an optimization
for subqueries. AQL queries with multiple subqueries will likely benefit from it.
While in search for further AQL query optimizations last week, we found that intermediate AQL
query results were copied one time too often in some cases. Precisely, the data that a query’s
ReturnNode will return to the caller was copied into the ReturnNode’s own register. With
ReturnNodes never modifying their input data, this demanded for something that is called
return-value optimization in compilers.
2.6 will now optimize away these copies in many cases, and this post shows which performance
benefits can be expected due to the optimization.
The export API is useful when the goal is to extract all documents from a given collection
and to process them outside of ArangoDB.
The export API can provide quick and memory-efficient snapshots of the data in the underlying
collection, making it suitable for extract all documents of the collection. It will be able
to provide data much faster than with an AQL query that will extract all documents.
In this post I’ll show how to use the export API to extract data and process it with PHP.