JPA implementation patterns: Bidirectional associations vs. lazy loading

Vincent Partington

Two weeks ago I blogged about the use of the Service Facade and Data Transfer Object pattern in JPA application architecture. This week I will move from the high level perspective and discuss an interesting interaction I discovered between the way bidirectional associations are managed and lazy loading. So let's roll up our sleeves and get dirty in this next installation of the JPA implementation patterns series. ;-)

This blog assumes that you are familiar with the Order/OrderLine example I introduced in the first two blogs of this series. If you are not, please review the example.

Consider the following code:

OrderLine orderLineToRemove = orderLineDao.findById(30);
orderLineToRemove.setOrder(null);

The intention of this code is to unassociate the OrderLine with the Order it was previously associated with. You might imagine doing this prior to removing the OrderLine object (although you can also use the @PreRemove annotation to have this done automatically) or when you want to attach the OrderLine to a different Order entity.

If you run this code you will find that the following entities will be loaded:

  1. The OrderLine with id 30.
  2. The Order associated with the OrderLine. This happens because the OrderLine.setOrder method invokes the Order.internalRemoveOrderLine method to remove the OrderLine from its parent Order object.
  3. All the other OrderLines that are associated with that Order! The Order.orderLines set is loaded when the OrderLine object with id 30 is removed from it.


Depending on the JPA provider, this might take two to three queries. Hibernate uses three queries; one for each of the lines mentioned here. OpenJPA only needs two queries because it loads the Order with the OrderLine using an outer join.

However the interesting thing here is that all OrderLine entities are loaded. If there are a lot of OrderLines this can be a very expensive operation. Because this happens even when you are not interested in the actual contents of that collection, you are be paying a high price for keeping the bidirectional associations intact.

So far I have discovered three different solutions to this problem:

  1. Don't make the association bidirectional; only keep the reference from the child object to the parent object. In this case that would mean removing the orderLines set in the Order object. To retrieve the OrderLines that go with an Order you would invoke the findOrderLinesByOrder method on the OrderLineDao. Since that would retrieve all the child objects and we got into this problem because there are a lot of those, you would need to write more specific queries to find the subset of child objects you need. A disadvantage of this approach is that it means an Order can't access its OrderLines without having to go through a service layer (a problem we will address in a later blog) or getting them passed in by a calling method.
  2. Use the Hibernate specific @LazyCollection annotation to cause a collection to be loaded "extra lazily" like so:
    @OneToMany(mappedBy = "order", cascade = CascadeType.PERSIST, fetch = FetchType.LAZY)
    @org.hibernate.annotations.LazyCollection(org.hibernate.annotations.LazyCollectionOption.EXTRA)
    public Set orderLines = new HashSet();

    This feature should cause Hibernate to be able to handle very large collections. For example, when you request the size of the set, Hibernate won't load all the elements of the collections. Instead it will execute a SELECT COUNT(*) FROM ... query. But even more interestingly: modifications to the collection are queued instead of being directly applied. If any modifications are pending when the collection is accessed, the session is flushed before further work is done.

    While this works fine for the size() method, it doesn't work when you try and iterate over the elements of the set (see JIRA issue HHH-2087 which has been open for two and a half years). The extra lazy loading of the size also has at least two open bugs: HHH-1491 and HHH-3319. All this leads me to believe the extra lazy loading feature of Hibernate is a nice idea but not fully mature (yet?).

  3. Inspired by the Hibernate mechanism of postponing operations on the collection until you really need them to be executed, I have modified the Order class to do something similar. First an operation queue has been added as a transient field to the Order class:
    private transient Queue queuedOperations = new LinkedList();

    Then the internalAddOrderLine and internalRemoveOrderLine methods have been changed so that they do not directly modify the orderLines set. Instead they create an instance of the appropriate subclass of the QueuedOperation class. That instance is initialized with the OrderLine object to add or remove and then placed on the queuesOperations queue:

    public void internalAddOrderLine(final OrderLine line) {
    	queuedOperations.offer(new Runnable() {
    		public void run() { orderLines.add(line); }
    	});
    }
    
    public void internalRemoveOrderLine(final OrderLine line) {
    	queuedOperations.offer(new Runnable() {
    		public void run() { orderLines.remove(line); }
    	});
    }

    Finally the getOrderLines method is changed so that it executes any queued operations before returning the set:

    public Set getOrderLines() {
    	executeQueuedOperations();
    	return Collections.unmodifiableSet(orderLines);
    }
    
    private void executeQueuedOperations() {
    	for (;;) {
    		Runnable op = queuedOperations.poll();
    		if (op == null)
    			break;
    		op.run();
    	}
    }

    If there were more methods that need the set to be fully up to date, they would invoke the executeQueuedOperations method in a similar manner.

    The downside here is that your domain objects get cluttered with even more "link management code" than we already had managing bidirectional associations. Abstracting out this logic to a separate class is left as an exercise for the reader. ;-)

Of course this problem not only occurs when you have bidirectional associations. It surfaces any time you are manipulating large collections mapped with @OneToMany or @ManyToMany. Bidirectional associations just makes the cause less obvious because you think you are only manipulating a single entity.

Addendum d.d. May 31st, 2009: You should not use the method described above at bullet #3 if you are cascading any operations to the mapped collection. If you postpone modifications to the collections your JPA provider won't know about the added or removed elements and will cascade operations to the wrong entities. This means that any entities added to a collection on which you have set @CascadeType.PERSIST won't be persisted unless you explicitly invoke EntityManager.persist on them. On a similar note the Hibernate specific @org.hibernate.annotations.CascadeType.DELETE_ORPHAN annotation will only remove orphaned child entities when they are actually removed from Hibernate's PersistentCollection.

In any case, now you know what causes this performance hit and three possible ways to solve it. I am interested to hear whether you ran into this problem and how you solved it.

For a list of all the JPA implementation pattern blogs, please refer to the JPA implementation patterns wrap-up.

Comments (10)

  1. [...] Bidirectional associations vs. lazy loading [...]

  2. Andrew Phillips - Reply

    July 24, 2009 at 9:33 am

    A thought on 3 ("postponing operations on the collection"): wouldn't you want to move the "interception" into the collection itself? The suggestion is essentially proposal for how to process collection operations, so you'd presumably want that in a collection class, rather than in the DAO itself.

    But there are some gotchas, like ensuring any decorating collections are created only after the field injection of the ORM-managed fields has taken place. I guess @PostConstruct could be used for that, but that’s a bit ugly. Or you could use double checked locking.

    So something like

    class Entity {
      Collection persistedCollection;
      transient Collection lateLoadingPersistedCollection;
    
      @PostConstruct
      void createLateLoadingCollections() {
        lateLoadingPersistedCollection = new LateLoadingCollection(persistedCollection);
      }
    
      Collection getPersistedCollection() {
        return lateLoadingPersistedCollection;
      }
    
    }
    

    or

    class Entity {
      Collection persistedCollection;
    
      // let's pretend closures made it to Java 7
      transient LazyField lateLoadingPersistedCollection =
        new LazyField({ new LateLoadingCollection(persistedCollection) });
    
      Collection getPersistedCollection() {
        assert persistedCollection != null;
        return lateLoadingPersistedCollection.get();
      }
    
    }
    

    where LazyField is just some wrapper around some double-checked locking lazy instantiation code.

    Because both examples rely on returning the decorated collection from the getter, this presumably wouldn't work with property access, by the way (but I guess that wouldn't bother you).

    The LateLoadingCollection itself would just be a decorator around the real, ORM-managed collection that would intercept the add and remove methods (storing the values in a queue or whatever) and would simply merge and then delegate to the underlying set in all other cases, which would result in a load whenever e.g. the iterator was called.

    A key question here would be whether there is some guarantee that get (or any of the other methods that would result in a merge of any queued changes) would be called when an entity is persisted. Otherwise, changes to the underlying persistedCollection might not be saved, of course...

  3. Vincent Partington - Reply

    July 24, 2009 at 12:21 pm

    @Andrew: Interesting idea to abstract the queued operation into a separate set implementation.

    One remark about the last paragraph where you mention that the queued operations need to be propagated to the underlying set before the transaction is committed: this is not necessary when the set is not the owning side of a bidirectional relation (i.e.. when the set has the mappedBy field set on the @OneToMany annotation). And in most cases that is so.

    In any case, you'd rather not do this last-minute propagation because initializating the set is expensive (the reason we are bothering with this pattern anyway).

  4. M - Reply

    August 18, 2009 at 9:29 pm

    Hello,

    As I am new to jpa, please forgive the simplicity of the question. I have the following problem:

    How can I read a collection of strings from a child table that references the parent table by a FK ? The database is legacy and I can not change it. This seems likes such a basic operation and it frustrates me that I do not know how to do it in JPA.

    Thank You,
    MPK

  5. Vincent Partington - Reply

    August 24, 2009 at 8:09 pm

    @M: Are you saying that the child table only has two columns? The first a foreign key to the parent and the second column the string value? And the primary key is the combination of that foreign key and the string value?

    In that case one way to do it is to define the child as a separate entity with a composite primary key defined with the @IdClass annotation.

    Is that what you are looking for? If not, please post the definition of your schema.

  6. Bruno - Reply

    October 28, 2009 at 6:55 pm

    I was very excited to find this post because I have not found this issue discussed thoroughly elsewhere. The issue of large collections and inadvertent lazy-loading is one that has plagued me for years. In fact, we have been using TopLink and invented a framework to accomplish exactly the "exercise for the reader" you put forth. But this framework imposes a fair amount of boilerplate code into the entity classes and I've been fighting with bugs in the implementation for years (only some of which I introduced myself :-)).

    After all of this time, I have finally come to the simple conclusion that using one-to-many relationships for large collections is a just a bad idea. Even if you are to get around the issues with lazy-loading, you still face the problem of the memory usage for the collection. We use EclipseLink's "SoftWeak" caching strategy in general. But when one object holds a collection to hundreds or thousands of other entities that otherwise have only a weak reference, garbage collection is severely hampered. I basically draw the line at around 100 objects. Any relationship that would involve more than 100 objects is not a candidate for a collection mapping. This is not a hard and fast rule, but I have found that the cost of the extra DAO and service layer coding outweighs the hassles of dealing with these performance issues.

  7. Vincent Partington - Reply

    October 31, 2009 at 10:19 am

    @Bruno: I wholeheartedly agree with your rule of thumb. It is a lot harder to get the performance of large collections mapped as one-to-many relations right than if you were to retrieve them using a query (solution #1). So if the set becomes large, it is safer to go the explicit-query route.

    One problem exists though: because it is so easy to set up @OneToMany associations and it makes traversing the object graph easier, you are likely to end up with @OneToMany associations where you don't want them later on. In fact, what might be small set of data when you start a project, might end up becoming a large set when actual data flows through the program. So the safest thing of all would be to avoid @OneToMany associations altogether. Hmm...

  8. Lionel Orellana - Reply

    July 10, 2010 at 1:52 am

    Hi Vincent,

    In the context of DDD I am starting to feel avoiding one-to-many associations between Aggregates (Roots) and using queries instead is indeed the safest thing to do.

    Within the same Aggregate it's probably ok.

    Good insight. Thanks.

    Lionel.

  9. nillehammer - Reply

    February 4, 2011 at 3:09 pm

    Hi Vincent,

    thanks for this most valuable article! I've been struggling with managing bidi associations for a long time now. After reading your article I radically changed the design of my entities and can only see advantages.

    So my advice is: Collections/Maps are EVIL! Wherever you can (and this is in most cases). Avoid their usage!

    The reasons are not quite related to mapping the entities to the database but due to the fact that correct usage of Collections/Maps is far more complicated than one would expect on the first sight.

    1. The interface types (Collection, List, Set etc.) do not implement Serializable. If you want your entities be Serializable (and according to Hibernate docs you should want that). They allways need a special treatment. That means extra effort.

    2. When using the Collection types that guarantee uniqueness (Map, Set) you will have to ensure that changes to values in your entieties do not affect the result of hashCode() and equals(). When using their sorted variants (SortedSet, SortedMap) the result of compareTo() or the Comparator have to be taken care of too. If the changes affect those, you will have to deal with that. E.g. pull the entity out of the collection, change the attributes, put it back in). That is extra effort and might even result in additional DELETE/INSERT statements (allthough I havent investigated that).

    3. Collections bloat your code. To deal with them you will have to implement methods for adding, checking containment of, retrieving, and deleting elements as well as decorating them unmodifiable. And you must (lazily) initialize them. That's a lot of methods! Far more complicated compared to the getter (and possibly setter) of the child entity.

    The only drawback is the one that you allready mentioned (you can only find children through your data acces layer).

    As for dealing with large collections. You seldomly need to fetch the whole collection. On a standart GUI you can display an amount of -let's say- 50 elements wihtout scrolling. Most websites (mine too) use a pager for this situation. So I'm thinking of configuring fetch size and using scrollable result sets. But haven't made that yet.

    Cheers nillehammer

  10. Williams - Reply

    January 23, 2014 at 11:21 pm

    Pretty nice post. I just stumbled upon your weblog and wished to say that I have truly enjoyed browsing your blog posts. After all I'll be subscribing to your rss feed and I hope you write again soon!

Add a Comment