Why testing code for thread safety is flawed

August 13, 2011

Overview

We know that ++ is not thread safe, even for volatile field, however there is a trick to proving this.

The problem with testing code for thread safety is that it can happen to work repeatedly, but still be unsafe.

An example

I wrote a lock free ring buffer which passed multi-threaded tests of one billion entries repeatedly. However, a couple of days later the test consistently failed. What was the difference? There was a couple of CPU intensive processes also running on the same box. When these processes finished, the test passed again.

The problem

If you try to prove that incremental volatile variable is not thread safe, you can get a result like this.

public static void main(String... args) throws InterruptedException {
    for (int nThreads = 1; nThreads <= 64; nThreads*=2)
        doThreadSafeTest(nThreads);
}
static class VolatileInt {
    volatile int num = 0;
}
private static void doThreadSafeTest(final int nThreads) throws InterruptedException {
    final int count = 32 * 1000;

    ExecutorService es = Executors.newFixedThreadPool(nThreads);
    final VolatileInt vi = new VolatileInt();
    for (int i = 0; i < nThreads; i++)
        es.submit(new Runnable() {
            public void run() {
                for (int j = 0; j < count; j += nThreads)
                    vi.num++;
            }
        });
    es.shutdown();
    es.awaitTermination(10, TimeUnit.SECONDS);
    System.out.printf("With %,d threads should total %,d but was %,d%n", nThreads, count, vi.num);
}

On my machine with 8 logical threads prints

With 1 threads should total 32,000 but was 32,000
With 2 threads should total 32,000 but was 32,000
With 4 threads should total 32,000 but was 32,000
With 8 threads should total 32,000 but was 32,000
With 16 threads should total 32,000 but was 32,000
With 32 threads should total 32,000 but was 32,000
With 64 threads should total 32,000 but was 32,000

There doesn't appear to be a problem. Why? This is because it takes time to start each thread, and each thread doesn't take long to complete so even though many tasks are started, each completes before the next one starts. i.e. it is effectively single threaded.

Thread safety bugs can hide

In a more complex test, you might not know what subtle thing to change which causes your code to break. In fact you can deploy your application into production and it can work for years. One day something changes like you add another application, increasing the load, a version of Java or your machine is upgraded and suddenly it fails intermittently. It is tempting to assume that the most recent change is the cause of the problem when actually the bug has always been there, it just hasn't shown itself.

Changing the test

When we run the test long enough to have multiple threads running at once, we see a different pattern. We see that the single threaded test behaves as expected, without losing any counts, however the multi-threaded test runs start dropping incremented values.

With 1 threads should total 100,000,000 but was 100,000,000
With 2 threads should total 100,000,000 but was 75,127,690
With 4 threads should total 100,000,000 but was 51,338,289
With 8 threads should total 100,000,000 but was 35,177,375
With 16 threads should total 100,000,000 but was 15,264,270
With 32 threads should total 100,000,000 but was 14,385,095
With 64 threads should total 100,000,000 but was 15,818,747

Fixing the test

If you replace a volatile int with AtomicInteger you get the following result.

With 1 threads should total 100,000,000 but was 100,000,000
With 2 threads should total 100,000,000 but was 100,000,000
With 4 threads should total 100,000,000 but was 100,000,000
With 8 threads should total 100,000,000 but was 100,000,000
With 16 threads should total 100,000,000 but was 100,000,000
With 32 threads should total 100,000,000 but was 100,000,000
With 64 threads should total 100,000,000 but was 100,000,000

Conclusion

When trying to prove a multi-threaded test fails by experimentation is it not easy, even for trivial examples. This is why understanding the thread safety guarantees Java provides is important. This way you will know code is thread safe.

Work around

What can you do if you can't read all the code to guarantee thread safety? I would suggest loading up the box until everything is busy. Create processes so every thread is busy and create disk and network activity so they are always busy. While this is happening run your test. If it passes, your application is less likely to fail under load. This is no guarantee but it is better than testing your application without load.

The Code

IncrementNotThreadSafeMain.java

Comments

Min13 August 2011 at 14:55
Check out ConTest at IBM alphaworks
ReplyDelete
Replies
Peter Lawrey13 August 2011 at 15:09
Concurrency testing can be valuable if it finds bugs. However, just because a test passes doesn't mean you won't have thread safety issues. The only sure way is to understand the code and the guarantees that the platform provides.
ReplyDelete
Replies
Parwinder Sekhon14 August 2011 at 17:06
Taking a slightly different slant to ConTest, multithreadedtc allows you to test very specific thread interleavings in a simple manner.

http://www.cs.umd.edu/projects/PL/multithreadedtc/overview.html
ReplyDelete
Replies

Vanilla Java