We kicked off our hunt for JPA implementation patterns with the Data Access Object pattern and continued with the discussion of how to manage bidirectional associations. This week we touch upon a subject that may seem trivial at first: how to save an entity.
Saving an entity in JPA is simple, right? We just pass the object we want to persist to EntityManager.persist. It all seems to work quite well until we run into the dreaded "detached entity passed to persist" message. Or a similar message when we use a different JPA provider than the Hibernate EntityManager.
So what is that detached entity the message talks about? A detached entity (a.k.a. a detached object) is an object that has the same ID as an entity in the persistence store but that is no longer part of a persistence context (the scope of an EntityManager session). The two most common causes for this are:
The contract for persist (see section 3.2.1 of the JPA 1.0 spec) explicitly states that an EntityExistsException is thrown by the persist method when the object passed in is a detached entity. Or any other PersistenceException when the persistence context is flushed or the transaction is committed. Note that it is not a problem to persist the same object twice within one transaction. The second invocation will just be ignored, although the persist operation might be cascaded to any associations of the entity that were added since the first invocation. Apart from that latter consideration there is no need to invoke EntityManager.persist on an already persisted entity because any changes will automatically be saved at flush or commit time.
Those of you that have worked with plain Hibernate will probably have grown quite accustomed to using the Session.saveOrUpdate method to save entities. The saveOrUpdate method figures out whether the object is new or has already been saved before. In the first case the entity is saved, in the latter case it is updated.
When switching from Hibernate to JPA a lot of people are dismayed to find that method missing. The closest alternative seems to be the EntityManager.merge method, but there is a big difference that has important implications. The Session.saveOrUpdate method, and its cousin Session.update, attach the passed entity to the persistence context while EntityManager.merge method copies the state of the passed object to the persistent entity with the same identifier and then return a reference to that persistent entity. The object passed is not attached to the persistence context.
That means that after invoking EntityManager.merge, we have to use the entity reference returned from that method in place of the original object passed in. This is unlike the the way one can simply invoke EntityManager.persist on an object (even multiple times as mentioned above!) to save it and continue to use the original object. Hibernate's Session.saveOrUpdate does share that nice behaviour with EntityManager.persist (or rather Session.save) even when updating, but it has one big drawback; if an entity with the same ID as the one we are trying to update, i.e. reattach, is already part of the persistence context, a NonUniqueObjectException is thrown. And figuring out what piece of code persisted (or merged or retrieved) that other entity is harder than figuring out why we get a "detached entity passed to persist" message.
So let's examine the three possible cases and what the different methods do:
| Scenario | EntityManager.persist | EntityManager.merge | SessionManager.saveOrUpdate |
|---|---|---|---|
| Object passed was never persisted | 1. Object added to persistence context as new entity 2. New entity inserted into database at flush/commit |
1. State copied to new entity. 2. New entity added to persistence context 3. New entity inserted into database at flush/commit 4. New entity returned |
1. Object added to persistence context as new entity 2. New entity inserted into database at flush/commit |
| Object was previously persisted, but not loaded in this persistence context | 1. EntityExistsException thrown (or a PersistenceException at flush/commit) | 2. Existing entity loaded. 2. State copied from object to loaded entity 3. Loaded entity updated in database at flush/commit 4. Loaded entity returned |
1. Object added to persistence context 2. Loaded entity updated in database at flush/commit |
| Object was previously persisted and already loaded in this persistence context | 1. EntityExistsException thrown (or a PersistenceException at flush or commit time) | 1. State from object copied to loaded entity 2. Loaded entity updated in database at flush/commit 3. Loaded entity returned |
1. NonUniqueObjectException thrown |
Looking at that table one may begin to understand why the saveOrUpdate method never became a part of the JPA specification and why the JSR members instead choose to go with the merge method. BTW, you can find a different angle on the saveOrUpdate vs. merge problem in Stevi Deter's blog about the subject.
Before we continue, we need to discuss one disadvantage of the way EntityManager.merge works; it can easily break bidirectional associations. Consider the example with the Order and OrderLine classes from the previous blog in this series. If an updated OrderLine object is received from a web front end (or from a Hessian client, or a Flex application, etc.) the order field might be set to null. If that object is then merged with an already loaded entity, the order field of that entity is set to null. But it won't be removed from the orderLines set of the Order it used to refer to, thereby breaking the invariant that every element in an Order's orderLines set has its order field set to point back at that Order.
In this case, or other cases where the simplistic way EntityManager.merge copies the object state into the loaded entity causes problems, we can fall back to the DIY merge pattern. Instead of invoking EntityManager.merge we invoke EntityManager.find to find the existing entity and copy over the state ourselves. If EntityManager.find returns null we can decide whether to persist the received object or throw an exception. Applied to the Order class this pattern could be implemented like this:
Order existingOrder = dao.findById(receivedOrder.getId()); if(existingOrder == null) { dao.persist(receivedOrder); } else { existingOrder.setCustomerName(receivedOrder.getCustomerName()); existingOrder.setDate(receivedOrder.getDate()); }
So where does all this leave us? The rule of thumb I stick to is this:
I hope this blog gives you some pointers on how to save entities and how to work with detached entities. We'll get back to detached entities when we discuss Data Transfer Objects in a later blog. But next week we'll handle a number of common entity retrieval pattern first. In the meantime your feedback is welcome. What are your JPA patterns?
For a list of all the JPA implementation pattern blogs, please refer to the JPA implementation patterns wrap-up.
Filed under JPA, JPA implementation patterns, Java | 10 Comments »
You should check out JDO’s handling of this issue. JDO 2.x uses the same method, PersistenceManager.makePeristent(Object) for both persisting new objects and merging detached objects. There is no distinction between persisting and attaching. Because the method returns Object, you should always use the one returned; sometimes, it will be the same instance given (in the case of a transient object being made newly persistent), and others, it will return a copy of the instance given (if the given instance was detached).
JDO also takes care to ensure that bidi associations are maintained. From section 15.3 of the JDO 2.2 specification:
=====
If two relationships (one on each side of an association) are mapped to the same column, the field
on only one side of the association needs to be explicitly mapped.
The field on the other side of the relationship can be mapped by using the mapped-by attribute identifying
the field on the side that defines the mapping. Regardless of which side changes the relationship,
flush (whether done as part of commit or explicitly by the user) will modify the datastore to
reflect the change and will update the memory model for consistency. There is no further behavior
implied by having both sides of the relationship map to the same database column(s). In particular,
making a change to one side of the relationship does not imply any runtime behavior by the JDO
implementation to change the other side of the relationship in memory prior to flush, and there is no
requirement to load fields affected by the change if they are not already loaded.
=====
Nice stuff. Everyone should check it out.
-matthew
I observe newer data in a database admin tool than I get returned from em.refresh(foo) as well as em.find(foo) and foo = em.merge(foo)
Where could I be in error?
Ah, leave my previous comment out, it was some other error that caused the observed behaviour.
Wouldn’t that rule out optimistic locking? Or at least be very intrusive about it’s handling (i.e. manual vs. handled by the provider).
Besides this behavior is related to the “Serialization” method you name in the article, as, apparently, these do not honor the Serialization process of Java. Should they (and I know Hessian does not!), uninitialized associations would remain proxies and be handled by the provider accordingly.
I have the honest impression this results more in fighting the framework, than making it work for you…
@Alex: how does the DIY merge rule out optimistic locking? I guess it depends on how the optimistic locking is implemented, but then I still can’t think of the examples.
As for the fact that I the serialization methods I use (Hessian, AMF/BlazeDS) apparently do not keep associations intact, I guess the problem here is that I am using domain objects as data transfer objects. That is a practice encouraged by the fact that you get away with in a lot of cases thanks to stuff like the ModelAndView and the WebDataBinder in Spring Web MVC.
When you use associations (especially ones that would make you walk the entire object graph) you usually do not want them all to be serialized, leaving you with this problem.
In fact I actually write separate DTO’s for my more complex domain models and then you always have manually copy/merge the received data to your domain object.
[...] Saving (detached) entities [...]
I use this:
JpaDao { public void persist(E entity) { if (entity.getId() == null) { entityManager.persist(entity); } else { if (!entityManager.contains(entity)) { entityManager.merge(entity); } } } }Thank you for a comprehensive article!
In our scenario the data comes from an external source in XML format (we use JAXB to unmarshal it), and needs to be merged with the one already in the DB. We use entityManager.merge in all cases.
In fact, the major problem is the transfer protocol – the data is coming as a complex deeply nested XML (think of the complete invoice, including information about the customer), so the DIY merge is problematic. We have chosen to specify the meaningful IDs in our entity beans and now it works completely transparently. The major problem was with the compound IDs and complex relationships between entities (and both simultaneously), but those were sorted eventually… Though, at this stage there’s a high risk for application to lose its independence from the JPA provider (for example, ours now works with OpenJPA only).
To speed up the development we successfully use HyperJAXB 3 (http://confluence.highsource.org/display/HJ3/Home), which generates 90% of the code, though still need to handcraft some JPA annotations anyway.
Hi, i have a big doubt about some key concepts about JPA. What’s the difference between a detached entity and a new entity?
Regards.
@Oscar Calderon: Sorry for not replying earlier. I see that Jeanne Boyarsky has already answered your question over at CodeRanch.
But I’ll answer your question for other readers of this blog: the difference is that a new entity is one that has just been created in Java but has not been persisted in the database yet, while a detached entity exists in Java and in the database but the Java entity is no longer attached to a session so any updates to it will not be reflected in the database.