With all the big-data postings, now something about traditional SQL, running at DB2 9. We had created a web-application, that was basically a view for a database. The application displayed content of a database after the user would enter search parameters, or everything when no parameter was entered. The database contained a few hundred-thousands records. All in all very simple.
The software worked fine at our test-environment. But when we deployed the software into production, it would hardly ever display data. In the logs we found frequent locking errors as follow:
15:23 ERROR (org.hibernate.engine.jdbc.spi.SqlExceptionHelper) - DB2 SQL Error: SQLCODE=-911, SQLSTATE=40001, SQLERRMC=68, DRIVER=4.1
Searching the web revealed what that error meant: a deadlock or timeout has occurred. We knew the same database was also accessed by another application, Activiti running processes written in BPMN. The processes in the Activiti-engine only inserted a few records per minute, so we didn't expect that application having a big influence. Yet how can a few inserts per minute have such a big impact?
After some talking to a DBA'er and doing our own experimentation we found out about the problem, which will describe here.
As said the other application was an Activiti that executed processes (to be exact: process-definitions in BPMN). At several steps in the process, Java code was executed by the Activiti's delegate mechanism. That java code would insert data into our database.
A DB2-DBA'er told us, when you'd insert a new record during a database-transaction, that record was locked until you would release the transaction by issuing a commit or rollback. Of course we already knew that. However, he also told use that other transactions could still be influenced by that lock: a SELECT without or with broad WHERE clause would have to wait until the first transaction finished. The reason for that: DB2 not just locks the record, but a part of the tablespace where that record happens to be inserted into.
While inserting that data would take very little time, the execution of the whole process-execution could take 5 to 20 seconds. The reason for the long duration: part of the steps involved invoking other webservices, that were not always quick to respond. Since only at the end of the process a commit was performed, a lock existed for part of the database for up to 20 seconds. So this means our console couldn't return any data if the other process was still busy, as shown below:
|Time||Process a(ctiviti)||Process c(onsole)|
|1||INSERT something INTO mytable||Process a is good to go|
|2||SELECT * FROM mytable||Process c has to wait for process a|
|3||INSERT something INTO myothertable||Process c still has to wait|
|4||commit||Process a commits, finally process c can continue|
|5||commit||Process c is now also finished|
We tried out by searching in our console on primary keys. Then our console would respond immediately, as expected: because then only a lock of a small part of the table is needed.
To get around the problem, we decided to commit from within each delegate, basically after each step in the process. We checked of course with the product-owner: for consistency having some data visible during the execution of the process wasn't a problem.
DB2 doesn't support nested transaction, so in our delegate a new dbsession had to be opened (and closed):
EntityManagerFactoryImpl entityManagerFactory = (EntityManagerFactoryImpl) Persistence. createEntityManagerFactory(properties.getProperty("myunitname"); sessionFactory = entityManagerFactory.getSessionFactory(); Session session = sessionFactory.openSession(); session.getTransaction().begin(); ... session.getTransaction().commit(); session.close();
Note: if you use Spring-transactions, Spring will do the above boilerplate for you.
The behavior of the process as listed before was in fact desired in the first place. After all writes, a flush statement existed in the original code. On DB2 that statement would indeed cause the data to be written to disk, but the data wouldn't be visible until the commit was performed.