Web performance in seven steps; Step 5: Monitor and diagnose

Last time I blogged about the importance of continuous performance testing. When you write and run performance tests continuously, just like unit tests, you get early performance insights in new and changed features of your software. This will minimize surprises and be more productive. Now I’ll blog about monitoring and diagnostics.

When a new version of the software is released into the production environment, the question always is: will it actually perform like we saw in testing and acceptance environments? And we keep our fingers crossed.
It is therefore important in such cases to monitor carefully what happens with the performance and availability of the application.

There are all sorts of tools and services available to monitor your web site for availability and response times of web pages, like Uptrends, Site24x7 and Dotcom-monitor. They look at the application as a black box and measure once in several minutes. This is very useful, however, to be able to take the right measures in case of a calamity, it is necessary to be able to pin-point the problem. It is therefore essential to monitor on multiple levels and on multiple internal application parts. For levels, think of hardware, OS, app server, web server, database and application. Measuring internal Java application parts can be achieved with JAMon. JAMon is an open source timing API and basically works like a stopwatch with a start() and stop() call. Every method which you want to measure gets its own stopwatch (or counter) . We deal with JAMon as one of the tools to measure time in the first day of our Speeding up Java Applications course.

JAMon API start() and stop() calls in a Spring interceptor
Figure: JAMon API start() and stop() calls in a Spring interceptor

Each counter maintains statistics like the number of calls, average, maximum, standard deviation, etc. , and this information can be requested for. The individual calls are not stored. This approach results in low memory usage and a low performance overhead, at the cost of some information loss. Recently, a new competitor of JAMon appeared: Simon. It claims to be JAMon’s successor, although it has (had) some infancy issues.

Then there is the question: where to measure? The answer is that it makes most sense to measure all incoming calls like web requests and outgoing calls to for instance the database. Furthermore, parts like Spring beans, EJB’s and DAO’s. Measuring these parts is not only relevant with new releases, but also trends and usage spikes are useful to monitor in order to solve quickly and prevent various problems. Open source tool JARep offers the possibility to store JAMon data from a cluster in a database and monitor trends and changes graphically.

JARep shows the increasing response time trend starting October 15, on two of the four production JVMs.
Figure: JARep shows the increasing response time trend starting October 15, on two of the four production JVMs.

Customer story

We had the following situation at our customer. Processing an order slowly took more and more time over a period of several weeks. This happened while no new release was introduced and no other page became slower. This behavior was a complete mystery, until we looked deeper in our JARep monitoring tool. The troublemaker turned out to be a DAO executing a prepared statement with only part of the variables being bind-variables. With help of JARep, we could look back to where the trend of increasing response time started so when the problems started. We could also see that this problem was only present at one of the two machines. With this knowledge and his log book, the operator could remember that on the start date he had experimented with a new JDBC driver to try to solve a memory leak. This seemed not to change anything concerning performance, what actually was the case in the beginning. Problems only appeared slowly during the following weeks. They had left the new driver in place, manifesting itself as a time bomb later. When we put back the old driver, the problem disappeared. This real life experience shows the usefulness of monitoring and trend analyses on application internals.

Next time I'll blog about evidence based tuning.

Comments (7)

  1. Wang - Reply

    September 1, 2009 at 5:17 am

    Have you looked at Pingdom and AlertFox yet?

  2. Marcin - Reply

    September 6, 2009 at 8:49 pm

    There's is also interesting tool Perf4J [1] which allow to calculate and present performance statistics in convenient way. It has also support for generating graphs.

    [1] - http://perf4j.codehaus.org/

    Marcin

  3. Jeroen Borgers - Reply

    September 9, 2009 at 9:31 am

    Hi Wang, those tools are new to me. They seem to fall in the same category as Uptrends, Site24x7 and Dotcom-monitor. What is your experience with them?

  4. Jeroen Borgers - Reply

    September 9, 2009 at 9:34 am

    Hi Marcin, I have looked a bit at Perf4J. What is your experience and what is your opinion how it compares to JAMon?

  5. william el kaim - Reply

    October 1, 2009 at 7:26 pm

    For me the perfect tools are:
    - CA Wily introscope (monitors everything from the JVM, expensive) or Sun JVM tools (free)
    - Some people use AspectJ to inject some perf code when needed
    - And yourkit for profiling
    - and a good network analyzer (wireshark - open source)

    I would like also to pinpoint that synthetic monitoring is different from real time monitoring. Synthetic monitoring plays a script to verify that the scenario works well every x minutes. So if you experience issue in between the x interval, it is not visible.

  6. Haim Yadid - Reply

    October 25, 2009 at 10:04 pm

    Another worthwhile tool for monitoring / tracing java application (much better than AspectJ) for this purpose is btrace. You can find info about it in my web site ...

  7. Marcin - Reply

    March 21, 2010 at 7:45 pm

    Hi Jeroen. The main problem with JAMon is a lack of updates and I haven't used it much. Perf4j is still under development and gives you an ability to measure execution time, generate consist reports and when needed charts. I had no problems with Spring based projects, unfortunately Perf4j was limited to support only AspectJ. It has changed with recently released version 0.9.13 when together with Alex Devine from Perfj4 team we introduced a new AOP model.
    Currently Perf4j can be used in handy way with (in theory) almost every AOP framework. As a proof of concept Alex created integration with EJB interceptors and I wrote an add-on for Seam Framework - Seam-Perf4j - http://seam-perf4j.sourceforge.net/ .

Add a Comment