JVM Threading optimizations revisited
By Jeroen Borgers
Last week I instructed an in-house performance tuning course and explained the participants about the threading optimizations in the Java 6 VM. We run the exercises of the course on Java 6 update 11 and when I told them that Escape Analysis did not work properly yet, I realized I did not really know this for a fact for this update of Sun’s Java. So, it is time to re-run the benchmark and find some unexpected results.
I again use the same LockTest benchmark as discussed in my InfoQ article which I performed on jdk update 4. My source at Sun told me back then that I could expect improvements in lock elision by escape analysis in the update 10 and following releases. So, good enough reasons to do the test again.
Escape Analysis not improved; other locking optimizations deteriorated
When I started doing measurements, I quickly found out that unfortunately lock elision by escape analysis has not improved from update 4 to the most recent one, update 12. However, I found some other unexpected performance changes between update 4 and update 12. The next figure illustrates that.
I did each test seven times on the same type of laptop as in the InfoQ article with the server VM and default settings. I calculated the average and standard deviation assuming a normal distribution. The red arrows are drawn from avg – stddev to avg + stddev to show the distribution of measurements.
A number of interesting conclusions can be drawn from these measurements. First the good news is that Sun has achieved to improve StringBuilder performance by about 10% in this benchmark. The bad news however, is that StringBuffer performance has deteriorated by about 10%! The threading overhead is thereby increased from 36% to a whopping 66%, making it more expensive to use locking, at least on a dual core Intel machine. Furthermore, the variation in the times of StringBuffer has increased substantially, as shown with the red arrows. The cause of this variation needs further investigation.
The next figure shows the results of further measurements with switching on and off the –XX:+UseBiasedLocking and –XX:+EliminateLocks JVM options. By default they are both on.
Figure 2. Average measured times [ms] of LockTest for Sun jdk 6 update 12, StringBuffer times with JVM different settings compared. The range denoted by the arrows covers 67% of the distribution of measured times.
I did these tests on my new laptop with a newer Intel Centrino CPU, therefore the absolute times are shorter than in previous test. The relative times are about equal. The figure shows that something weird is going on with the EliminateLocks feature. With update 4 we saw that BiasedLocking sometimes appeared to not fully work sometimes. That is no longer the case; it has a healthy, low variation in times now as seen with the most right hand side bar in the figure. However, both options off and only EliminateLocks on now result in a surprisingly high variation, which was not that high in update 4 with the same benchmark. And that shows to a lesser extend with the default settings. And even more striking is that while with update 4 the default setting resulted in the best performance, now the EliminateLocks optimization actually turns out to work counterproductively in this single threaded benchmark. We get the best results with disabling the EliminateLocks feature and just use biased locking instead of with the default settings.
I can only guess about what is going on inside the VM and how it works with the CPU and caches. The eliminate locks feature has varying effectiveness in seemingly similar situations and seems to get in the way or even conflict with biased locking. This however does not mean that you should switch off this feature for your application. Remember to Beware of the Benchmarks, they give an answer to a specific question. In this case: What is the thread safely overhead of StringBuffer, influenced by the VM threading optimizations? Furthermore, we might elevate effects in this micro-benchmark that don't show up in real applications. It would be interesting to see results of other threading benchmarks comparing these updates. Based on this benchmark, I would say there is room for improvement for the Hotspot JVM developers.
Results on the IBM JVM
Just as a side note, the IBM virtual machine is reported to have Escape Analysis fully functional. It can be downloaded for Windows as part of the IBM Development Package for Eclipse. Let’s run the benchmark on that one.
\ibm_sdk60\bin\java.exe -showversion -Xmn1g -Xgcpolicy:gencon LockTest
java version "1.6.0"
Java(TM) SE Runtime Environment (build pwi3260sr1-20080416_01(SR1))
IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 Windows Vista x86-32 jvmwi3260-20080415_18762 (JIT enabled, AOT enabled)
J9VM - 20080415_018762_lHdSMr
JIT - r9_20080415_1520
GC - 20080415_AA)
JCL - 20080412_01
StringBuffer: 976 ms.
StringBuilder: 1006 ms.
Thread safety overhead of StringBuffer: -3%
Wow! Escape analysis now fully shows its face. I don’t think I need to draw the bars to make clear that this JVM does a much better job at executing this benchmark. Hats off to the J9 guys.