For our project we use Hibernate. The application we are building reads work items from the database, processes them (validation) and writes the results back to the database; a typical data processing application. Optimally the process would be streaming, a gigantic select would be used to fetch millions of rows and process each row in a transaction (The processing of a row results in several DML statements). Now there are technical obstacles to implementing the application in this fashion. The RDBMS should be able to process millions of short transactions, while keeping the long transaction that reads the rows alive. Oracle cannot handle this, due to its read consistency functionality. After quite some time a ORA-01555: snapshot too old (rollback segment too small) will inevitably crash the long running transaction. Our implementation divides the gigantic select in smaller chunks, to prevent the "snapshots" from getting "too old".
Pfff, the first obstacle was out of the way. Next problem: Hibernate. We chose Hibernate as our ORM solution, because... because... we were all already familiar with it. Which is the lamest excuse in the world and will mostly lead to solving the wrong problem with the wrong tool. The problem with Hibernate in this situation is at the same time one of its main features. To support transactional write-behind Hibernate keeps track of all objects loaded in its Session. During the batch processing all work items get loaded in the session. The memory associated with this isn't the biggest problem. Whenever a Session is flushed, Hibernate will inspect each associated object to look for changes and write these to the database. If the session gets big, flushing it will take more and more time, even if there are no changes, as is the case with this long read transaction. The solution for this is to evict the objects from the session as soon as possible. Since we read the objects from a ScrollableResults each result is loaded separately. We wrap this results object in an EvictingIterator that will evict every work item from the session. When using this approach, one has to be very careful to evict all objects, also the objects that are loaded by cascading. Luckily, 'evict' is a cascade option of Hibernate, so in the mapping files specify cascade='evict' on all associations that are loaded and... presto!
Now let's take a step back: what problem have we solved here? Using Hibernate to do the gigantic select to fetch the objects from the database has not helped us a single bit. Quite the opposite; we have to work around Hibernate's Session to make it work! This is exactly how a Golden Hammer can lead you astray. Instead of solving your problems, it leads to more problems that are solution specific. So after making it all work with Hibernate, we decided to invest some time in trying to find another solution. Our first attempt was Ibatis.
We decided to invest a week on the Ibatis PoC. We ended up spending most of the time cleaning up our configuration, code and tests, but finally we got to the actual Ibatis code. This is what we found:
(more...)