J@ArangoDB

{ "subject" : "ArangoDB", "tags": [ "multi-model", "nosql", "database" ] }

Compiling an Optimized Version of ArangoDB

Why should you care about compiling ArangoDB on your own when there are official release packages that are ready to use?

There are three main reasons for compiling on your own:

  • as a developer you want to make changes to the ArangoDB C++ source code. Then your only option obviously is to compile on your own. Please consult the compiling a debug version of ArangoDB page for more information.

  • you are trying to get meaningful stack traces from core dumps produced by ArangoDB and need an ArangoDB binary that comes with enough debug information (debug symbols, probably also assertions turned on). In this case, please also consult the compiling a debug version of ArangoDB blog post for how to get this done.

  • you want to use an ArangoDB binary that is optimized for your specific target architecture.

The latter reason is relevant because the official release packages that are provided by ArangoDB cannot make too many assumptions about the environment in which they will be used. In the general release packages there is not so much room for platform-specific optimizations as there would be if you are compiling just for the local machine.

For example, all relevant CPU offer SIMD instructions that a compiler can exploit when generating code. But different generations of CPUs offer different levels of SIMD instructions. Not every CPU in use today provides SSE4, not to talk about AVX.

To make our release packages compatible with most environments, we have had to make some conservative assumptions about the CPU abilities, which effectively disables many optimizations that would have been possible when creating a build that only needs to run on a specific architecture.

To fully exploit the capabilities of a specific target environment, it’s required to build executables for that specific architecture. Most compilers offer an option -march for that. You normally want to set this to native when compiling an optimized version. There are also lots of compiler options for enabling or disabling specific CPU features such as -msse, -msse2, -msse3, -mssse3, -msse4.1, -msse4.2, -msse4, -mavx, -mavx2, to name just a few.

The good news is that there is no need to deal with such compiler-specific optimization options in order to get an optimized build. The cmake-based ArangoDB 3.0 builds will automatically test the local environment’s capabilities and set the compiler options based on which CPU abilities were detected.

For example, a mere (cd build && cmake -DCMAKE_BUILD_TYPE=Release ..) will run the CPU ability detection and configure the build to use the features supported by the local architecture:

configuring a release build
1
(mkdir -p build; cd build && cmake -DCMAKE_BUILD_TYPE=Release ..)

For example, on my laptop this prints:

cmake output
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- The CXX compiler identification is GNU 5.3.1
-- The C compiler identification is GNU 5.3.1
...
-- target changed from "" to "auto"
-- Detected CPU: haswell
-- Performing Test check_cxx_compiler_flag__march_core_avx2
-- Performing Test check_cxx_compiler_flag__march_core_avx2 - Success
-- Performing Test check_cxx_compiler_flag__msse2
-- Performing Test check_cxx_compiler_flag__msse2 - Success
-- Performing Test check_cxx_compiler_flag__msse3
-- Performing Test check_cxx_compiler_flag__msse3 - Success
-- Looking for pmmintrin.h
-- Looking for pmmintrin.h - found
-- Performing Test check_cxx_compiler_flag__mssse3
-- Performing Test check_cxx_compiler_flag__mssse3 - Success
-- Looking for tmmintrin.h
-- Looking for tmmintrin.h - found
-- Performing Test check_cxx_compiler_flag__msse4_1
-- Performing Test check_cxx_compiler_flag__msse4_1 - Success
-- Looking for smmintrin.h
-- Looking for smmintrin.h - found
-- Performing Test check_cxx_compiler_flag__msse4_2
-- Performing Test check_cxx_compiler_flag__msse4_2 - Success
-- Performing Test check_cxx_compiler_flag__mavx
-- Performing Test check_cxx_compiler_flag__mavx - Success
-- Looking for immintrin.h
-- Looking for immintrin.h - found
-- Performing Test check_cxx_compiler_flag__msse2avx
-- Performing Test check_cxx_compiler_flag__msse2avx - Success
-- Performing Test check_cxx_compiler_flag__mavx2
-- Performing Test check_cxx_compiler_flag__mavx2 - Success
-- Performing Test check_cxx_compiler_flag__mno_sse4a
-- Performing Test check_cxx_compiler_flag__mno_sse4a - Success
-- Performing Test check_cxx_compiler_flag__mno_xop
-- Performing Test check_cxx_compiler_flag__mno_xop - Success
-- Performing Test check_cxx_compiler_flag__mno_fma4
-- Performing Test check_cxx_compiler_flag__mno_fma4 - Success
...

The detected options will end up in the CMakeCache.txt file in the build directory. The Makefile generated by cmake will automatically make use of these options when invoking the C++ compiler.

The compiler options are not shown by default, but they can be made visible by compiling with the option VERBOSE=1, e.g.

configuring and compiling a build
1
2
(mkdir -p build; cd build && cmake -DCMAKE_BUILD_TYPE=Release ..)
(cd build && make -j4 VERBOSE=1)

Note that this will be very verbose, so you only want to set the VERBOSE=1 option to check that the compiler options were picked correctly.

On my local laptop, the architecture-specific compiler options that were automatically detected and used were

compiler architecture options used
1
-march=core-avx2 -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx -msse2avx -mavx2 -mno-sse4a -mno-xop -mno-fma4

The build has detected core-avx2, which in my case is good – and a lot more specific than the official packages which for example cannot assume the presence of either SSE4 or AVX instructions.

And now that we can rely on the presence of specific CPU instructions, some code parts such as JSON parsing can make use of SSE4.2 instructions, or the compiler can use some optimized SIMD variants for operations such as memcpy, strlen etc.