While doing some optimization work for the upcoming ArangoDB 3.0 release, we had to figure out what was the “ideal” way of converting a string representation of a number into a C++ uint64_t (64 bit unsigned integer type). This kind of operation is performed a lot during the lifetime of an ArangoDB server process, so it seemed worthwhile making it as fast as possible.
std::stoull and std::strtoull
The natural solution for this kind of conversion is the standard library’s std::stoull function. On the one hand, that solution is potentially optimized and definitely robust. It performs validation of the input characters and is also battle-tested by millions of users.
On the other hand, std::stoull has a few “features” that some would consider “anti”-features, and that sound like they could negatively influence performance:
- it may behave locale-dependent
- it can parse and will let pass negative input values, which is often not desired for a result of unsigned type
- it can perform base conversion
In C++11 there is also the alternative std::strtoull. It should behave about the same as std::stoull except in case of error. std::stoull will throw and std::strtoull will not.
That’s what we get from the standard library for long unsigned integer types.
Alternative implementations (w/o error checking)
Another alternative is to roll a string to number conversion function ourselves. If we hard-code it to base 10 and do not care about error checking, a naive implementation may look like this:
1 2 3 4 5 6 7 8 9 10
Obviously the above will produce wrong results for invalid input data, but for “trusted” (known to be valid) input it may be just fine.
Here’s just another implementation for the problem at hand. This one doesn’t perform the times 10 operation, but splits it into two bitshifting operations:
1 2 3 4 5 6 7 8 9 10
Again no error checking is present in the above function, but it should be ok for trusted inputs.
By manually unrolling the while loop and converting it into a switch statement, we can also come up with a conversion function that has minimal branching:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
To check out how all these alternatives perform, I put them into a small test driver program. To compile it with g++ and run it, I used this command:
The test program will convert input strings of different lengths to uint64_t using the above (and some other) implementations, going up from 10,000 iterations per string up to 100,000,000. It also prints the wall-clock time spent in each run. The most interesting output it prints is in the “ms” column. The “result” column can be fully ignored. It’s only there so the compiler won’t optimize away the calls to the conversion functions.
The timings from my local laptop (Intel Core i7-4710HQ CPU @ 2.50GHz, Ubuntu 16.04, g++ 5.3.1 – your mileage may vary!) for the 100,000,000 conversions are:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
It’s no surprise the standard library functions with their generality and error checking features are slower than the specialized functions that took the liberty to ignore all that.
Which method is fastest now?
As can be seen above, the “bitshift” variant was fastest for short and medium length inputs in my environment. At some point when input values get longer, the “bitshift” methods gets overtaken by the “unrolled” variant. The “naive” variant was slower than the two in most cases, but still performs surprisingly well.
The take-away: even though string-to-number conversion routines are present in the standard library, it can still be beneficial to hand-craft specialized variants of them for maximum performance. This is especially true when most of the generality that the standard library routines provide is not required in your use case.
For example, if you know that the conversion functions will only operate on “trusted” input (only numeric digit input characters, length of input string won’t exceed the maximum length of a uint64_t number etc.) then error checking is not required.
Additionally, if you can limit yourself to numbers of a specific base and don’t negative base conversion nor the handling of negative values, a lot of the generality and safety overhead of std::stoull may be avoided.