Did escape analysis escape from Java 6?

Jeroen Borgers

Escape Analysis in Java 6?

Last month we held our Speeding up Java applications course in the Dutch woods. When preparing for it, I discussed some of the new topics with my peer instructor and creator of the course Kirk Pepperdine. We explain new features of Java 6 and how they can help improve your performance. One of the more sophisticated features on the VM level is called escape analysis. The question is: does it really work?

We tell about escape analysis not only in the course, but also in our Performance Top-10 blog and podcast, and in my J-Fall presentation. Brian Goetz writes in September 2005: “Escape analysis is an optimization that has been talked about for a long time, and it is finally here -- the current builds of Mustang (Java SE 6) can do escape analysis …” Furthermore, Wikipedia states: “Escape analysis is implemented in Java Standard Edition 6.” And several escape analysis JVM switches, like -XX:-DoEscapeAnalysis are available. So, we can assume it works, right?

But, let us not assume here because assumption is the mother of all f*** ups. And it turns out as we will see: it does not work! We need to measure, not make assumptions. I read an interview with Java specialist Heinz Kabutz where he actually measures. He benchmarks various ways of String concatenation. He uses the thread-safe StringBuffer and thread-unsafe StringBuilder where the latter turns out to be significantly faster than the former with Java 6. He does not talk about escape analysis, but with escape analysis working properly, using StringBuffer would be as fast as using StringBuilder, like we claim in our Top 10 blog! So, escape analysis is not working here. I’ll explain what is going on.

Escape Analysis explained

This analysis performed by the runtime compiler can conclude for example that an object on the heap is referenced only locally in a method and no reference can escape from this scope. If so, Hotspot can apply runtime optimizations. It can allocate the object on the stack or in registers instead of on the heap. Or it can remove acquiring and releasing locks on the object altogether (lock elision) since no other thread can access the object anyway. Example:

public String concatBuffer(String s1, String s2, String s3) {
	StringBuffer sb = new StringBuffer();
	sb.append(s1);
	sb.append(s2);
	sb.append(s3);
	return sb.toString();
}

Here, result sb is only used locally in the method and no reference can escape from this scope. It is a candidate for stack allocation and lock elision.

With lock elision, using objects which are thread-safe but don’t need to be in the used context, will not have the overhead of being thread-safe. So, this is like no cure – no pay: no threads to stop – no overhead. The VM switch to enable this is: -XX:+DoEscapeAnalysis.

Biased locking explained

Biased locking is another threading optimization in Java 6. Since most objects are locked by at most one thread during their lifetime, this is a sensible case to optimize for. This is what biased locking does. It allows that thread to bias an object toward itself. Once biased, that thread can subsequently lock and unlock the object without resorting to expensive atomic instructions. The VM switch is: -XX:+UseBiasedLocking and it is on by default.

Lock coarsening explained

Another threading optimization is lock coarsening or merging. Adjacent synchronized blocks are merged into one synchronized block, or multiple synchronized methods are joined into one. This only holds if the same lock object is used. So, this reduces the locking overhead. Example:

public static String concatToBuffer(StringBuffer sb, String s1, String s2, String s3) {
	sb.append(s1);
	sb.append(s2);
	sb.append(s3);
	return sb.toString();
}

In this example, the StringBuffer lock is not a candidate for lock elision because it is used outside of the method. But the three times of acquiring and releasing the lock can be reduced into one, after in-lining the append methods. The VM switch is: -XX:+EliminateLocks and it is on by default.

Benchmarking – measuring if it works

So, to practice what we preach, I created a lock-intensive LockTest benchmark to test these three VM options. The code is shown below. I first want to run it with all mentions options disabled, on my Vista laptop:

java -XX:-DoEscapeAnalysis -XX:-EliminateLocks -XX:-UseBiasedLocking LockTest

I get:

Unrecognized VM option '-DoEscapeAnalysis'
Could not create the Java virtual machine.

Hmmm, strange. I’m sure this is the correct option. After some digging, we found that it only works for the server VM, not the client VM which is default on 32 bit Windows. These VM’s are currently two separate binaries. After contacting my valuable performance team source at Sun, I learned that it is not enabled by default, unlike other locking optimizations. The other surprising thing he told me was that allocation optimization (using escape analysis) was not yet in the JDK. They are still working on it and expect it to be available in the spring 2008 JDK update. So this is disappointing, there has been a ‘little’ delay since the statement from Brian Goetz in 2005. He however also told me that escape analyses-lock elision actually is available from the latest JDK release (6_03). Let’s do the test. Here are my results:

>java -server -XX:-DoEscapeAnalysis -XX:-EliminateLocks -XX:-UseBiasedLocking LockTest
StringBuffer: 6553 ms.
StringBuilder: 1836 ms.
Thread safety overhead of StringBuffer: 256%
.
>java -server -XX:+DoEscapeAnalysis -XX:-EliminateLocks -XX:-UseBiasedLocking LockTest
StringBuffer: 6546 ms.
StringBuilder: 1872 ms.
Thread safety overhead of StringBuffer: 249%
.
>java -server -XX:-DoEscapeAnalysis -XX:+EliminateLocks -XX:-UseBiasedLocking LockTest
StringBuffer: 3101 ms.
StringBuilder: 1836 ms.
Thread safety overhead of StringBuffer: 68%
.
>java -server -XX:-DoEscapeAnalysis -XX:-EliminateLocks -XX:+UseBiasedLocking LockTest
StringBuffer: 2852 ms.
StringBuilder: 1855 ms.
Thread safety overhead of StringBuffer: 53%
.
>java -server -XX:-DoEscapeAnalysis -XX:+EliminateLocks -XX:+UseBiasedLocking LockTest
StringBuffer: 2645 ms.
StringBuilder: 1823 ms.
Thread safety overhead of StringBuffer: 45%

Conclusions

So, we clearly see that for this obvious case, escape analysis optimizations did escape from Java 6. It is not available for the client VM, it is disabled by default and if you enable it for the server VM, it does not help significantly: thread safety overhead here stays at about 250%. Ideally, thread safety overhead should go down to 0% with escape analysis lock elision. However, we fortunately see that lock coarsening and biased locking do help a lot and bring the overhead down to about 50%.

We see again in this exercise that we should question assumptions, ask the right questions and actually measure to get evidence.

Hopefully the spring 2008 update of Java 6 will bring the proper escape analysis optimizations which have been promised for such a long time! We’ll keep you posted.

LockTest code:

public class LockTest {
	private static final int MAX = 20000000; // 20 million
.
	public static void main(String[] args) throws InterruptedException {
		// warm up the method cache
		concatBuffer("Josh", "James", "Duke");
		concatBuilder("Josh", "James", "Duke");
.
		System.gc();
		Thread.sleep(1000);
.
		long start = System.currentTimeMillis();
		for (int i = 0; i < MAX; i++) {
			concatBuffer("Josh", "James", "Duke");
		}
		long bufferCost = System.currentTimeMillis() - start;
		System.out.println("StringBuffer: " + bufferCost + " ms.");
.
		System.gc();
		Thread.sleep(1000);
.
		start = System.currentTimeMillis();
		for (int i = 0; i < MAX; i++) {
			concatBuilder("Josh", "James", "Duke");
		}
		long builderCost = System.currentTimeMillis() - start;
		System.out.println("StringBuilder: " + builderCost + " ms.");
		System.out.println("Thread safety overhead of StringBuffer: "
				+ ((bufferCost * 10000 / (builderCost * 100)) - 100) + "%\n");
	}
.
	public static String concatBuffer(String s1, String s2, String s3) {
		StringBuffer sb = new StringBuffer();
		sb.append(s1);
		sb.append(s2);
		sb.append(s3);
		return sb.toString();
	}
.
	public static String concatBuilder(String s1, String s2, String s3) {
		StringBuilder sb = new StringBuilder();
		sb.append(s1);
		sb.append(s2);
		sb.append(s3);
		return sb.toString();
	}
}

Comments (18)

  1. Bram Somers - Reply

    December 21, 2007 at 5:57 pm

    Hi,

    I've tried your test on my laptop (AMD X2, linux os)

    And this are my results (which are, btw, clearly a lot different than yours)

    * java -server -XX:-DoEscapeAnalysis -XX:-EliminateLocks -XX:-UseBiasedLocking LockTest

    Executing this command gives me:
    StringBuffer: 7669 ms.
    StringBuilder: 3274 ms.
    Thread safety overhead of StringBuffer: 134%

    * java -server -XX:+DoEscapeAnalysis -XX:+EliminateLocks -XX:+UseBiasedLocking LockTest

    Executing this command however, gives me this:

    StringBuffer: 6840 ms.
    StringBuilder: 3414 ms.
    Thread safety overhead of StringBuffer: 100%

    Why is this?

    Bram

    ps. I did try the test several times...

  2. Jeroen Borgers - Reply

    December 22, 2007 at 10:21 am

    Hi Bram,

    Interesting results. Those optimizations give you significantly less speedup on your configuration.

    Do you have java 1.6.0_03 running?

    Many is involved here: the cost of the atomic instructions on (multi-core) processor/cache level, the cost of a context switch on the OS level, the way the JVM utilizes these, etc. Any of these can make the difference.

    I have an Intel Core 2 Duo and Vista, so quite different from yours.

    What are your intermediate results?

    Jeroen.

  3. Bram Somers - Reply

    December 22, 2007 at 8:01 pm

    Hi Jeroen,

    Ouch, I indeed do not have the update 3 version, stupid me :)
    Anyway, I'll download the update 3 of JAVA 6 and run it again on my configuration and give feedback about my results.

    Greetings,

    Bram

  4. Bram Somers - Reply

    December 24, 2007 at 9:35 am

    Hi,

    I did the test again with JAVA 6 update 3, same results though...

    this is my output:

    bsomers@chaos:~/Desktop/jdk1.6.0_03/bin$ ./java -server -XX:-DoEscapeAnalysis -XX:-EliminateLocks -XX:-UseBiasedLocking LockTest
    StringBuffer: 7906 ms.
    StringBuilder: 3625 ms.
    Thread safety overhead of StringBuffer: 118%

    bsomers@chaos:~/Desktop/jdk1.6.0_03/bin$ ./java -server -XX:-DoEscapeAnalysis -XX:+EliminateLocks -XX:+UseBiasedLocking LockTest
    StringBuffer: 7799 ms.
    StringBuilder: 3695 ms.
    Thread safety overhead of StringBuffer: 111%

  5. RuntimeException - Reply

    March 29, 2008 at 3:49 am

    Very interesting.

    There is a pretty significant improvement, but not the 0% expected overhead from a theoretical point.

    I would like the JVM developers to be little more transparent about the JVM features and how to use them, a lot of the documentation is hard to find and understand.

    Another point, Lock coarsening can actually have significant impact on the application functionality. What are the rules that trigger this? Two consecutive synchronizations on the same object without any code in the middle?

    IMO this features can be a real can of worms. It actually affects the synchronization execution, which a lot of sophisticated cache routings rely on to prevent contention.

  6. henk - Reply

    August 3, 2008 at 9:18 am

    >Another point, Lock coarsening can actually have significant impact on the application functionality.

    I'm not sure that's really such a problem. Just think about what happens: with lock coarsening a lock is hold for 2 statements instead of one. So without it, if the lock would be contended by another thread, this other thread could theoretically execute in between.

    But now think about this some more. What are the chances in the first place that another thread would start executing exactly between those two statements? Pretty small. You should protect your app for the odds that I -can- happen, but this does not work the other way around. You can assume that it -does- happen. If your logic depends on the fact that another thread -is- going to execute between two synchronized statements, then that's just as much a mistake as assuming it's never possible.

  7. henk - Reply

    August 3, 2008 at 9:19 am

    >You can assume that it -does- happen

    This should of course be: You can't assume that it -does- happen

  8. Roel Spilker - Reply

    September 26, 2008 at 3:45 pm

    Dear Jeroen,

    Great article. I was wondering if you could do a follow-up, since it is now September 2008...

    By the way, I think you should warm up the hotspot compiler as well. Now, the first 8000 or so iterations are run in interpreted mode, and no optimizations are performed.

    Roel

  9. Roel Spilker - Reply

    September 26, 2008 at 4:17 pm

    I've modified your code to do 10000 warmup rounds. When I use the -XX:+DoEscapeAnalysis -XX:+Elimi
    nateLocks on my machine, I get 39% overhead!!!

    java version "1.6.0_06"
    Java(TM) SE Runtime Environment (build 1.6.0_06-b02)
    Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode)

  10. Jeroen Borgers - Reply

    September 26, 2008 at 4:25 pm

    Thanks Roel.
    This blog has actually evolved into a series of two articles on InfoQ. Warming up is indeed way too short here, this is corrected in the infoQ article.

    http://www.infoq.com/articles/java-threading-optimizations-p1

    Cheers, Jeroen.

  11. Roel Spilker - Reply

    September 26, 2008 at 5:31 pm

    Thank you for your quick reponse. I likes the article.

  12. stanimir - Reply

    November 30, 2008 at 6:57 pm

    When performing micro benchmarks, it's a nice touch to use -XX:CompileThreshold=1
    so HotSpot compiles everything from scratch

  13. Ismael Juma - Reply

    December 6, 2008 at 12:33 am

    Hi,

    It's actually a mistake to use -XX:CompileThreshold=1 as it will cause worse code to be generated. The reason for this is that the JIT has not collected enough profiling information. In general, it's best not to change that setting and to simply allow enough time for the JVM to warm up.

    Ismael

  14. Dema - Reply

    December 1, 2009 at 6:03 am

    java -Xbatch -server -XX:+DoEscapeAnalysis -XX:+EliminateLocks -XX:+UseBiasedLocking -XX:+AggressiveOpts LockTest

    StringBuffer: 2764 ms.
    StringBuilder: 2395 ms.
    Thread safety overhead of StringBuffer: 15%

    java -version
    java version "1.6.0_14"
    Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
    Java HotSpot(TM) Server VM (build 14.0-b16, mixed mode)

  15. Heinz Kabutz - Reply

    December 21, 2009 at 9:25 pm

    Hi Jeroen,

    only saw your reference to my interview now. My point was not to show that StringBuilder was faster than StringBuilder, but rather that premature optimizations lets us do unnecessary work that may result in slower code.

    You could have just as well written your method

    public static String concatBuilder(String s1, String s2, String s3) {
    return s1 + s2 + s3;
    }

    which could take advantage of future improvements of the String concatenation.

    In your example, your Strings are very short, so there is no expansion of the underlying array. Under other circumstances it would be faster to first size the array and then create the StringBuilder. You could even reuse the StringBuilder, which again, can improve the performance. These changes could be added into your code by the javac compiler, as long as you stuck the simple and clean s1 + s2 + s3.

    Hope that clarifies my point a bit better now? Don't optimize prematurely!

    Heinz

  16. Jeroen Borgers - Reply

    December 22, 2009 at 10:03 am

    Hi Heinz,

    You made a good point about premature optimizations. It triggered me to investigate and write this blog, because your StringBuilder vs. StringBuffer measurements mismatched with my assumption that Escape Analysis was working. It evolved in my further articles on InfoQ (with help of Kirk):
    http://www.infoq.com/articles/java-threading-optimizations-p1

    Jeroen.

  17. Heinz Kabutz - Reply

    December 23, 2009 at 9:11 pm

    Hi Jeroen,

    your articles on this topic are really good, especially the one on InfoQ. I wanted to write an article on Escape Analysis and you've done such a nice job that I'll leave my readers to catch up on a great definition from your article.

    However, I've discovered something rather interesting to do with Escape Analysis that I think I'll publish before the end of 2009 :-)

    Regards from Crete, come visit us here some time :-)

    Heinz

  18. am - Reply

    January 15, 2010 at 7:19 pm

    Another advantage of Escape Analysis is that objects are allocated onto the stack. So you can get reduce the load on the GC. That can be a significant optimization.

Add a Comment