Java low level: Converting between integers and text (part 1)

Overview

As Java has a number of way to convert an integer to a String, its not something you might have considered writing yourself.

However, there may be situations where you want to do this. One of them is when performance is critical. Java libraries tend to create more objects than are required. Usually this doesn't matter but there are times when you need the system to go faster and reading and writing numbers is a big hit for you.

Performance difference

The following is based on a tests of 128K integers in binary and text formats repeatedly to take an average time to write and read an integer.
Source to run all tests
Source for the examples Scroll down for the Unsafe examples.

On a 2.6 GHz Xeon
Unsafe text: Typically took 41.4 ns to write/read per long.
ByteBuffer direct text: Typically took 63.9 ns to write/read per long.
ByteBuffer heap text: Typically took 68.9 ns to write/read per long.
Print text: Typically took 325.7 ns to write/read per long.
DecimalFormat text: Typically took 645.5 ns to write/read per long.

Unsafe binary: Typically took 4.2 ns to write/read per long.
ByteBuffer binary: Typically took 10.3 ns to write/read per long.
DataStream binary: Typically took 72.1 ns to write/read per long.

On a 3.8 GHz i7
Unsafe text: Typically took 32.1 ns to write/read per long.
ByteBuffer direct text: Typically took 48.4 ns to write/read per long.
ByteBuffer heap text: Typically took 58.4 ns to write/read per long.
Print text: Typically took 205.6 ns to write/read per long.
DecimalFormat text: Typically took 436.8 ns to write/read per long.

Unsafe binary: Typically took 5.0 ns to write/read per long.
ByteBuffer binary: Typically took 17.4 ns to write/read per long.
DataStream binary: Typically took 50.8 ns to write/read per long.
The i7 has a faster clock speed than the Xeon, but a smaller cache. This narrows the gap between the fastest and slowest times. When comparing using a direct ByteBuffer and a heap ByteBuffer, there was little difference on the Xeon, however the direct ByteBuffer was consistently faster on the i7.

In this test, reading/writing integers as text using different approaches varies by as much as 14x. If your integer format is needs to be customised you may need to use DecimalFormat, or write your own.

Note: the faster options for writing/reading text were faster than using the binary DataInput/OutputStream. The stream arranges a long as 8 bytes in big-endian order which is more work than converting the number to text in these cases.

Writing an integer to text

For this example, all integers are treated as long type. There is a small performance advantage in having an int type instead but it is relatively small.

Integer representation

All signed integers, byte, short, int and long are represented as twos-complement Encoding/decoding this format doesn't require any bitwise operations like floating point numbers can to.

Extract the sign

Firstly extract the sign. This is simple to do,

Extracting the sign
if (num < 0) {
   writeByte('-');
   num = -num;
}

There is one edge case here which is Long.MIN_VALUE. Due to two-complement representation, this value is the negative of itself. One way to handle this value is to encode it specially. e.g. have a constant which contains what it should be encoded as. Another approach is to treat Long.MIN_VALUE as an unset value or NaN. Most spreadsheet applications treat an empty field as an unset cell. (This is my preference) Another special value is zero. Other numbers do not have a leading zero, but zero needs to have at least one digit.

Handling 0
if (num == 0) {
    writeByte('0');
    writeByte(SEPARATOR);
} else {

Writing the digits

It can be easier to decode a number from the end. This is the approach I have taken here.

Writing each digit in reverse
// find the number of digits
int digits = ParserUtils.digits(num);
// starting from the end, write each digit
for (int i = digits - 1; i >= 0; i--) {
    // write the lowest digit.
    buffer.put(buffer.position() + i, (byte) (num % 10 + '0'));
    // remove that digit.
    num /= 10;
}
// move the position to after the digits.
buffer.position(buffer.position() + digits);
writeByte(SEPARATOR);

Part 2

Decoding an integer from text

Comments

Popular posts from this blog

Java is Very Fast, If You Don’t Create Many Objects

System wide unique nanosecond timestamps

Unusual Java: StackTrace Extends Throwable