Why did Hibernate update my database?

Maarten Winkels

Hibernate is a sophisticated ORM framework, that will manage the state of your persistent data for you. Handing over the important but difficult task of managing persistent state of your application to a framework has numerous advantages, but one of the disadvantages is that you sort of lose control over what happens where and when. One example of this is the dirty checking feature that Hibernate provides. By doing dirty checking, Hibernate determines what data needs to be updated in your database. In many cases, this feature is quite useful and will work without any issues, but sometimes you might find that Hibernate decides to update something that you did not expect. Finding out why his happened can be a rather difficult task.

I was asked to look into some issue with a StaleObjectState exception the other day. StaleObjectState exceptions are used by hibernate to signal an optimistic locking conflict: While some user (or process) tries to save a data item, the same data item has already been changed in the underlying database since it was last read. Now the problem was that the process that was throwing the exception was the only process that was supposed to change that data. From a functional point of view there could not have been any other user or process that changed the data in the meantime. So what was going on?

Digging around in the log for some time, we found that the data was updated by some other process that was supposed to only read that data. Somehow Hibernate decided that the data read by that process had become dirty and should be saved. So now he had to find out why Hibernate thought that data was dirty.

Hibernate can perform dirty checking in several places in an application:

  1. When a transaction is being committed or a session is being flushed, obviously, because at that time changes made in the transaction or session should be persisted to the database
  2. When a query is being executed. To prevent missing changes that still reside in memory, Hibernate will flush data that might be queried to the database just before executing the query. It tries to be picky about this and not flush everything all the time, but only the data that might be queried.

It is quite difficult to check all these places to find out where the data is being find dirty, especially when the process executes several queries.

To find out why Hibernate deems the data to be dirty, we have to dig into the Hibernate internals and start debugging the framework code. The Hibernate architecture is quite complex. There are a number of classes that are involved in dirty checking and updating entities:

  • The DefaultFlushEntityEventListener determines what fields are dirty. The internals of this class work on the list of properties of an entity and two lists of values: the values as loaded from the database and the values as currently known to the session. It delegates finding out the ''dirty-ness' of a field to the registered Interceptor and to the types of the properties.
  • The EntityUpdateAction is responsible for doing the update itself. An object of this type will be added to a ActionQueue to be executed when a session is flushed.

These classes show some of the patterns used in the internals of Hibernate: eventing and action queuing. These patterns make the architecture of the framework very clear, but they also make following what is going on sometimes very hard...

As previously explained, flushing happens quite often and setting a breakpoint in the DefaultFlushEntityEventListener is not usually a good idea, because it will get hit very often. An EntityUpdateAction, however, will only get created when an update will be issued to the underlying database. So to find out what the problem was, I set a breakpoint in the constructor and backtracked from there. It turned out Hibernate could not determine the dirty state of the object and therefor decided to update the entity just to be save.

As mentioned eralier, Hibernate uses the "loaded state" to determine whether an object is dirty. This is the state of the object (the values of its properties) when loaded form the database. Hibernate stores this information in its persistence context. When dirty checking, Hibernate compares these values to the current values. When the "loaded state" is not available, Hibernate effectively cannot do dirty checking and deems the object dirty. The only scenario, however, in which the loaded state is unavailable is when the object has been re-attached to the session and thus not loaded from the database. The process I was looking into, however did not work with detached data.

There is one other scenario in which Hibernate will lose the "loaded state" of the data: When the session is being cleared. This operation will discard all state in the persistence context completely. It is quite a dangerous operation to use in your application code and it should only be invoked if you are very sure of what you're doing. In our situation, the session was being flushed and cleared at some point, leading to the unwanted updates and eventually the StaleObjectStateExceptions. An unwanted situation indeed. After removing the clear, the updates where gone and the bug was fixed.

Using Hibernate can save a developer a lot of time, when things are running smoothly. When a problem is encountered, a lot of specialized Hibernate knowledge and a considerable amount of time is often needed to diagnose and solve it.

Comments (9)

  1. Nico Mommaerts - Reply

    April 6, 2009 at 3:25 pm

    Omg what a coincidence, we had exactly the same problem today!

  2. John - Reply

    April 6, 2009 at 3:35 pm

    > Using Hibernate can save a developer a lot of time,
    > when things are running smoothly. When a problem is
    > encountered, a lot of specialized Hibernate knowledge
    > and a considerable amount of time is often needed to
    > diagnose and solve it.

    True. True.

  3. Pawel Kaczor - Reply

    April 6, 2009 at 9:07 pm

    > As mentioned eralier, Hibernate uses the "loaded state" to determine whether an object is dirty. This is the state of the object (the values of its properties) when loaded form the database. Hibernate stores this information in its persistence context. When dirty checking, Hibernate compares these values to the current values. When the "loaded state" is not available, Hibernate effectively cannot do dirty checking and deems the object dirty. The only scenario, however, in which the loaded state is unavailable is when the object has been re-attached to the session and thus not loaded from the database. The process I was looking into, however did not work with detached data.

    I don't know what you mean by "loaded state", but whatever it is you are wrong saying that dirty checking doesn't work when entity is reattached. It works, otherwise use of this feature would be very limited.

  4. Maarten Winkels - Reply

    April 7, 2009 at 4:04 am

    @Pavel: By "loaded state" I mean "the state of the object (the values of its properties) when loaded form the database" as mentioned in the second line you copied. It refers to the field: org.hibernate.engine.EntityEntry.loadedState, which contains these values.

    The org.hibernate.event.def.DefaultFlushEntityEventListener.dirtyCheck(FlushEntityEvent) method has a line, that reads:
    cannotDirtyCheck = loadedState==null;
    (It is amazing what reading code can teach you sometimes...)

    Now when you reattach an object, the object is not loaded from the database and Hibernate has no way of knowing whether it has changed in the meantime or not (other then querying the database, which you can configure it to do). There are several ways to reattach: using .update() or .lock() or cascading any of these methods. If you .update(), obviously Hibernate will update the database. If you lock, it depends on your LockMode.

  5. Pawel Kaczor - Reply

    April 7, 2009 at 7:45 am

    Thanks for explanation. Yes, you are right. Additional select is necessery to enable dirty checking for detached entities. This select is generated automatically if select-before-update=true.

  6. Geir Hedemar - Reply

    April 7, 2009 at 11:13 am

    I have had the same problem.

    A nice workaround that is available if you also use Spring and annotation-driven transactions is to declare your read-only service calls to be just that. This will flush out any writethroughs early on. The resulting code will look something like

    @Transactional(readOnly=true) public void serviceCall(...)

    which I also find to be concise and readable.

  7. Alan Bond - Reply

    January 27, 2010 at 9:53 pm

    I know I am responding to an old blog, but I ran into the same issue around January 2009 as well. Turned out that an attempt to improve efficiency by setting default values in the DTO classes that are mapped to tables will cause an immediate update of the record selected before it is returned. When log records are part of the return the log data that was saved bloated exponentially!

  8. Steve Hiller - Reply

    April 16, 2010 at 3:00 pm

    Here's a good one:

    I had to use a stored procedure to read data from a database. Some of the fields are of fixed length of type CHAR. Some of the data in these fields were padded with spaces to fill their fixed lengths. I have an object that is mapped to these fields, each property being of type String. For each setter, I was using the trim method to remove the padding from the input field. Using the trim method caused the object to be marked as dirty, thus causing an unwanted update. Really odd at first since there is no corresponding table to update to.

  9. Sridhar Sreenivasan - Reply

    February 1, 2012 at 11:06 pm

    Apologise for responding to this old blog. But Iam running into a weird scenario, and came across this article. Possibly facing this issue I guess. The overview of my issue is-
    ParentA class has a bi-directional inverse="true" with a child Map.
    Two threads A, and B process the same ParentA instance, and creates a new Child object instance.
    ThreadA-> ParentA1 -> ChildA1(key1, value1)
    ThreadB-> ParentA1 -> ChildA1(key1, value2)

    Logic in ParentA is such that if there's a key existing the value should be updated. Else the new Child object is persisted. We retry on StaleObjectState by restarting the session, and reloading the objects. So when trying in my dev. environmnet I process the same ParentA object across multiple threads. Processing of ThreadA inserts ChildA1, processing of ThreadB throws StaleObject during flush. Retry logic appropriately updates the ChildA to value2.
    But in our production environment ThreadB creates the new ChildA1 object (thus resulting in duplicates). Logs do indicate a StaleObject state. Despite the exception the child object is persisted. Is that related?

Add a Comment