• Home
  • RSS Feed
  • Log in

EJAPP Top 10 countdown: #6 – Improper caching
Posted by Vincent Partington at around evening time: March 26th, 2007

Let's continue the EJAPP Top 10 countdown with number 6.

Caching is a funny thing. Done right it can improve the performance of your Enterprise Java application tremendously and can even be essential to reach acceptable performance levels, but sometimes caching itself can be the cause for your performance problems. Improper caching covers both cases.

There are quite a lot of things that are candidates for caching:

  • Local resources that require a lot of initialization such as JNDI resources (EJBs), Spring BeanFactories, AxisEngines, etc.
  • Network resources that are hard to set up such as database connections, HTTP(S) connections, etc.
  • Data that is hard to retrieve/calculate such as object retrieved from a database, HTML pages rendered, etc.

For each of these, there are different trade-offs to make when it comes to questions like:

  1. How does the cache check that the resource still works? For example; If a database connection is not used for a while, the database may decide to close it or a firewall may drop the connection. Checking the validity of a resource may be an expensive operation that negates any performance gain that could be had from the cache.
  2. How does the cache check whether the resource has changed? Not properly implementing may mean that the application has to be restarted after a configuration (or content) change.
  3. How much memory does the cache take?
  4. Is the cache thread safe? Improperly implemented, lock contention can occur in the cache.
  5. What is the cache hit ratio? If the cache hit ratio would be too low, the cache management overhead negates any positive impact to be had.
  6. How is the performance of the application tested with respect to the cache? Can the cache be pre-loaded with correct data or disabled while testing?
  7. Can the cache be monitored at runtime? Things to monitor include the number of objects in the cache, the amount of memory used, and the cache hit ratio.
  8. Can the cache be managed at runtime? Think about enabling/disabling the cache, flushing its contents or saving/restoring its contents (for testing purposes).

A colleague of mine found a nice example of criteria #1, #4, and #5 conspiring to cause bad performance. The check-and-get part of the cache was placed in a critical section like this:

public Object getFromCache(String key) {
        synchronized(map) {
                if(!map.containsKey(key)) {
                        <em>... retrieve data from backend ...</em>
                }
                return map.get(key);
        }
}

However retrieving data from the backend took 100 ms and the cache hit ratio was less than 10%. The net effect was a lot of contention on the cache lock. The problem went away after this cache was removed altogether!

Like I said in the beginning, caching is a funny thing. :-)

At the least, keep the following in mind then thinking about caching:

  • Cache if and only if caching is needed.
  • Verify the functional and performance behaviour of your cache with performance tests.
  • Make sure your cache can be managed and monitored at runtime.
  • Share/Bookmark

Filed under Java, Performance | 8 Comments »



8 Responses to “EJAPP Top 10 countdown: #6 – Improper caching”



    Alexandre Says:
    Posted at: April 20, 2007 at 5:37 am

    Excellent blog. I just discovered your company and I like it very much.

    If you’re using a JVM 1.5 (or newer) then there’s a high chance you could use a java.util.concurrent map instead of synchronizing manually on a map. For example ConcurrentHashMap supports full concurrency for retrievals and ajustable expected concurrency for update (it’s in the JavaDocs). Doug Lea’s java.util.concurrent that appeared in 1.5 is a gigantic improvement on what was available in 1.4. I’d be curious to know if there’s still contention after using such a package (not that it would be that useful for a 10% cache-hit… But still I’m curious). (little detail: the code example, the “em” tags around “…retrieve data from backend…” don’t seem to work here)



    xeon Says:
    Posted at: May 1, 2007 at 1:57 am

    Please try to use Javolution instend of concurrent map and sycnronizing map



    Charlie Says:
    Posted at: May 1, 2007 at 2:07 pm
    public Object getFromCache(String key) {
      if ( map.containsKey(key) ) {
        return  map.get(key);
      } 
    
      synchronized(map) {
      if(!map.containsKey(key)) {
        ... retrieve data from
        backend ...
      }
      return map.get(key);
      }
    }
    

    Would help:-)



    Andreas Krüger Says:
    Posted at: May 2, 2007 at 1:57 pm

    Charlie – what you published is a typical example
    of code that is not thread safe.

    Please review http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html



    komofei Says:
    Posted at: May 28, 2007 at 12:43 pm

    Andreas, I think Charlie’s example is threadsafe if you use Hastable class for map.



    Xebia Blog Says:
    Posted at: June 6, 2007 at 9:07 am

    [...] Discussed are numbers 6-Improper Caching and 5-Excessive memory usage. More information about number 6 can be found here. and for number 5 see; here. [...]



    RuntimeException Says:
    Posted at: March 29, 2008 at 3:23 am

    Have you ever tested the access time to a synchronized map?

    We did testing of various caches, and one that stood head and shoulders above the rest is simply a synchronized LRU map. Beaten only by concurrent hash map, but not by much.

    Concurrent hashmap lacks LRU, so its application for revolving caches is very limited. Concurrent LRU map is probably the solution, but hard (impossible?) to develop.

    If you compare access time to a collection to access time to the database, then the penalty of synchronizing the collection is hardly noticeable, even in multi CPU environments.

    The code snippet above is contains some problems, it synchronized database access across threads. If there is one thing you don’t want is one thread waiting for another thread to do database access. Worse than that, one thread waiting for another thread to get data from the database where the two threads actually need different objects from the cache.

    The problem in this code snippet is not the synchronization, it is how synchronization is used.

    This type of programming requires little thought, and the only lesson here is that this is job for the expert programmer and not the novice. Concurrent hash map, for example, is a true expert work of software that very few developers can actually tackle.

    My 2C



    Tom Says:
    Posted at: December 25, 2009 at 12:16 am

    Building a cache that is free of both race conditions and unnecessary blocking takes some thought – none of the comments so far have got it right.

    I’d suggest an approach involving Futures:

    Map<K, Future> cache = new LinkedHashMap<K, Future>(MAX_CACHE_SIZE, 0.75f, true);
    
    public V getFromCache(K key) {
        Future futureValue;
        synchronized (cache) {
            futureValue = cache.get(key);
            if (futureValue == null) {
                futureValue = new CacheFill(key);
                if (cache.size() == MAX_CACHE_SIZE) {
                    // pop the least recently use entry from the cache
                    Iterator it = cache.entrySet().iterator();
                    it.next();
                    it.remove();
                }
                cache.put(key, futureValue);
            }
        }
        return futureValue.get();
    }
    
    class CacheFill implements Future {
        private final K key;
        private boolean querying;
        private V value;
    
        public CacheFill(K key) {
            this.key = key;
        }
    
        public V get() {
            synchronized (this) {
                if (querying) {
                    while (value == null) wait();
                    return value;
                }
                else {
                    querying = true;
                }
            }
            V value = getValueFromBackend(key);
            synchronized (this) {
                this.value = value;
                notifyAll();
            }
            return value;
        }
        // other Future methods (which are never actually used) left as an exercise for the reader
    }
    

    I have neither tested this nor proven it correct, but i think it’s about right, and although there’s quite a bit of locking going on, i don’t think there’s any approach with less locking that will work correctly. You could use a ConcurrentHashMap, which would cut down a lot on the locking and data structure traversal needed to work with the cache, but that doesn’t have the LRU ordering, so you’d have to have an auxiliary data structure for that, and that would probably offset any savings from using the concurrent map.



Leave a Reply

Click here to cancel reply.

Deployment automation for Java application running on Websphere, WebLogic and JBoss

Archives

  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009

Training

  • Tackling Top 10 JavaEE Performance Pitfalls
    13 & 14 May 2009

Xebia Sites

  • Xebia Corporate
  • Xebia France
  • Xebia India

Categories

  • Java (282)
  • Agile (109)
  • General (50)
  • Testing (42)
  • Performance (42)
  • Hibernate (36)
  • Scrum (33)
  • Podcast (31)
  • Architecture (31)
  • Spring (28)
  • SOA (24)
  • Maven (22)
  • Project Management (22)
  • Middleware (23)
    • Deployment (14)
  • Flex (17)
  • JPA (17)
  • Eclipse (15)
  • Xebia Labs (15)
  • Quality Assurance (14)

Tag Cloud

    IntelliJ JavaOne Java qcon Testing Spring Seam Introduction to Agile Hibernate SOA esb Scrum Xebia Agile Lean Groovy Closures Semantic Web product owner Functional Programming Performance Architecture fitnesse Poppendieck Ajax Agile Awareness Workshop Scala XML Grails Maven