Fastest String-to-uint64 Conversion Method?

While doing some optimization work for the upcoming ArangoDB 3.0 release, we had to figure out what was the “ideal” way of converting a string representation of a number into a C++ uint64_t (64 bit unsigned integer type). This kind of operation is performed a lot during the lifetime of an ArangoDB server process, so it seemed worthwhile making it as fast as possible.

std::stoull and std::strtoull

The natural solution for this kind of conversion is the standard library’s std::stoull function. On the one hand, that solution is potentially optimized and definitely robust. It performs validation of the input characters and is also battle-tested by millions of users.

On the other hand, std::stoull has a few “features” that some would consider “anti”-features, and that sound like they could negatively influence performance:

it may behave locale-dependent
it can parse and will let pass negative input values, which is often not desired for a result of unsigned type
it can perform base conversion

In C++11 there is also the alternative std::strtoull. It should behave about the same as std::stoull except in case of error. std::stoull will throw and std::strtoull will not.

That’s what we get from the standard library for long unsigned integer types.

Alternative implementations (w/o error checking)

Another alternative is to roll a string to number conversion function ourselves. If we hard-code it to base 10 and do not care about error checking, a naive implementation may look like this:

inline uint64_t naive(std::string const& value) {
  uint64_t result = 0;
  char const* p = value.c_str();
  char const* q = p + value.size();
  while (p < q) {
    result *= 10;
    result += *(p++) - '0';
  }
  return result;
}

Obviously the above will produce wrong results for invalid input data, but for “trusted” (known to be valid) input it may be just fine.

Here’s just another implementation for the problem at hand. This one doesn’t perform the times 10 operation, but splits it into two bitshifting operations:

inline uint64_t bitshift(std::string const& value) {
  uint64_t result = 0;

  char const* p = value.c_str();
  char const* q = p + value.size();
  while (p < q) {
    result = (result << 1) + (result << 3) + *(p++) - '0';
  }
  return result;
}

Again no error checking is present in the above function, but it should be ok for trusted inputs.

By manually unrolling the while loop and converting it into a switch statement, we can also come up with a conversion function that has minimal branching:

inline uint64_t unrolled(std::string const& value) {
  uint64_t result = 0;

  size_t const length = value.size();
  switch (length) {
    case 20:    result += (value[length - 20] - '0') * 10000000000000000000ULL;
    case 19:    result += (value[length - 19] - '0') * 1000000000000000000ULL;
    case 18:    result += (value[length - 18] - '0') * 100000000000000000ULL;
    case 17:    result += (value[length - 17] - '0') * 10000000000000000ULL;
    case 16:    result += (value[length - 16] - '0') * 1000000000000000ULL;
    case 15:    result += (value[length - 15] - '0') * 100000000000000ULL;
    case 14:    result += (value[length - 14] - '0') * 10000000000000ULL;
    case 13:    result += (value[length - 13] - '0') * 1000000000000ULL;
    case 12:    result += (value[length - 12] - '0') * 100000000000ULL;
    case 11:    result += (value[length - 11] - '0') * 10000000000ULL;
    case 10:    result += (value[length - 10] - '0') * 1000000000ULL;
    case  9:    result += (value[length -  9] - '0') * 100000000ULL;
    case  8:    result += (value[length -  8] - '0') * 10000000ULL;
    case  7:    result += (value[length -  7] - '0') * 1000000ULL;
    case  6:    result += (value[length -  6] - '0') * 100000ULL;
    case  5:    result += (value[length -  5] - '0') * 10000ULL;
    case  4:    result += (value[length -  4] - '0') * 1000ULL;
    case  3:    result += (value[length -  3] - '0') * 100ULL;
    case  2:    result += (value[length -  2] - '0') * 10ULL;
    case  1:    result += (value[length -  1] - '0');
  }
  return result;
}

Performance testing

To check out how all these alternatives perform, I put them into a small test driver program. To compile it with g++ and run it, I used this command:

g++ -Wall -Wextra -march=native -std=c++11 -O3 stdoull-test.cpp && ./a.out

The test program will convert input strings of different lengths to uint64_t using the above (and some other) implementations, going up from 10,000 iterations per string up to 100,000,000. It also prints the wall-clock time spent in each run. The most interesting output it prints is in the “ms” column. The “result” column can be fully ignored. It’s only there so the compiler won’t optimize away the calls to the conversion functions.

The timings from my local laptop (Intel Core i7-4710HQ CPU @ 2.50GHz, Ubuntu 16.04, g++ 5.3.1 – your mileage may vary!) for the 100,000,000 conversions are:

execution times for various string-to-uint64 implementations

test 'std::stoull'       string '1'                           1209 ms
test 'std::stoull'       string '99'                          1382 ms
test 'std::stoull'       string '1234'                        1725 ms
test 'std::stoull'       string '1234567'                     2257 ms
test 'std::stoull'       string '1234567891'                  2764 ms
test 'std::stoull'       string '12345678901234'              3899 ms
test 'std::stoull'       string '12345678901234678901'        7391 ms

test 'std::strtoull'     string '1'                           1104 ms
test 'std::strtoull'     string '99'                          1300 ms
test 'std::strtoull'     string '1234'                        1628 ms
test 'std::strtoull'     string '1234567'                     2428 ms
test 'std::strtoull'     string '1234567891'                  2662 ms
test 'std::strtoull'     string '12345678901234'              3705 ms
test 'std::strtoull'     string '12345678901234678901'        6631 ms

test 'naive'             string '1'                            202 ms
test 'naive'             string '99'                           314 ms
test 'naive'             string '1234'                         475 ms
test 'naive'             string '1234567'                      732 ms
test 'naive'             string '1234567891'                   987 ms
test 'naive'             string '12345678901234'              1343 ms
test 'naive'             string '12345678901234678901'        1862 ms

test 'bitshift'          string '1'                            188 ms
test 'bitshift'          string '99'                           245 ms
test 'bitshift'          string '1234'                         397 ms
test 'bitshift'          string '1234567'                      462 ms
test 'bitshift'          string '1234567891'                   666 ms
test 'bitshift'          string '12345678901234'               888 ms
test 'bitshift'          string '12345678901234678901'        1277 ms

test 'unrolled'          string '1'                            289 ms
test 'unrolled'          string '99'                           289 ms
test 'unrolled'          string '1234'                         351 ms
test 'unrolled'          string '1234567'                      408 ms
test 'unrolled'          string '1234567891'                   547 ms
test 'unrolled'          string '12345678901234'               778 ms
test 'unrolled'          string '12345678901234678901'        1068 ms

It’s no surprise the standard library functions with their generality and error checking features are slower than the specialized functions that took the liberty to ignore all that.

Which method is fastest now?

As can be seen above, the “bitshift” variant was fastest for short and medium length inputs in my environment. At some point when input values get longer, the “bitshift” methods gets overtaken by the “unrolled” variant. The “naive” variant was slower than the two in most cases, but still performs surprisingly well.

Conclusion

The take-away: even though string-to-number conversion routines are present in the standard library, it can still be beneficial to hand-craft specialized variants of them for maximum performance. This is especially true when most of the generality that the standard library routines provide is not required in your use case.

For example, if you know that the conversion functions will only operate on “trusted” input (only numeric digit input characters, length of input string won’t exceed the maximum length of a uint64_t number etc.) then error checking is not required.

Additionally, if you can limit yourself to numbers of a specific base and don’t negative base conversion nor the handling of negative values, a lot of the generality and safety overhead of std::stoull may be avoided.

J@ArangoDB

{ "subject" : "ArangoDB", "tags": [ "multi-model", "nosql", "database" ] }

Fastest String-to-uint64 Conversion Method?

std::stoull and std::strtoull

Alternative implementations (w/o error checking)

Performance testing

Conclusion