EJAPP Top 10 countdown: #1 - Incorrect database usage
I'll keep you in suspense no longer. 😉 It's time for numero uno of the EJAPP Top 10 countdown!
Somewhat unexpectedly for an Enterprise Java Application Performance Problems Top 10, the #1 issue is the incorrect usage of databases.
In a lot of Enterprise Java projects, the database is treated as a simple bit bucket and the DBA as just the guy that executes the SQL statements on the production machines. This disdain for the database and its experts leads to badly performing applications such as:
- A database that had a table with two columns: KEY and VALUE. Java objects were stored in here by storing property names in the KEY column and their values in the VALUE column. The motivation for this design was extensibility: no changes to the database were needed when the Java objects changed. Of course, there was a lot of code dealing with converting Java objects and their datatypes (int, boolean, etc.) into VARCHARs, that (a) did not perform, (b) had to be changed for every Java object modification, and (c) made it impossible to create proper indexes.
- An front-end application that read its data from views. The tables behind those views had indexes, but not on the columns used by the queries for the views!
Instead of pointing the finger at one another, Java developers and DBA's should work together in all stages of a project to make sure the database and the Java application server deliver the performance needed by the application.
- Think about what data you actually need to persist and when and who you are going to persist it. Also, have a look at the EJAPP Top 10 entry on improper caching on what to do (and not to do!) when dealing with caches.
- Model your data. Don't resort to the key/value pair design mentioned in the example above.
- Normalize your database to at least the third normal form. Anything past that is probably not worth the trouble.
- Choose your primary key wisely. Your choice will be around for a long time.
- Learn about the data types your database offers and their performance impact (Oracle).
- Use the database for what's it good at: referential integrity checking, validation, manipulation, selection and sorting of large data sets. Don't do this stuff in your Java code. For example, don't write two queries and then merge the results in Java: use a join. Seems obvious, but merging in Java still happens a lot. Think EJB n+1 problem!
- Finally, consider using stored procedures when manipulating large data sets instead of retrieving the data, manipulating it in Java and then writing it all back.
- Create indexes where needed. At least on the primary key columns (most databases do this automatically) but also consider creating indexes for foreign key columns to speed up joins. Don't go overboard or the database will spend a lot of time updating the indexes. And you will spend a lot of time administering them.
- Don't retrieve all columns with every query (SELECT * FROM ...), but just the ones you need. You can also do this in HQL and EJBQL.
- Know how your database handles transactions and concurrency (Oracle, DB2, MySQL) and use that do determine the transaction attributes and isolation levels you'll be using. See the EJAPP Top 10 entry on incorrectly implemented concurrency for more info.
- Don't be afraid to use the features offered by the database you're using. Database choices are usually even more rigid than application server choices, so it pays to learn all the tricks your database offers and exploit them to get maximal performance.
- Tune the database for the application, just like the application server needs to be tuned. Check the performance tuning guides for Oracle, DB2, and MySQL. I really liked O'Reilly Oracle Performance Tuning book, but unfortunately the last edition is from 1997. Still, it is an inspiring read if just for reading how these professionals approach performance problems and their projects in general.
- Specifically, examine the query execution plan (a.k.a. the explain plan) for the queries your application uses. Check the cost of any full table scans and add indexes where needed. Oracle, DB2, and MySQL all offer tools to do this.
- Learn how the query optimizer of your database (Oracle, DB2, MySQL) works. For example, Oracle 9i executes queries a LOT faster when the tables in both the FROM and the WHERE clauses are ordered from large to small. Changing the order of the tables in the FROM and WHERE clauses can make the execution time of a query drop from over 60 seconds to less than 1 second.
- Profile your JDBC calls. The open source tool P6Spy gives you insight in performance of your JDBC calls. The commercial tool Quest Performasure does that and a whole lot more.
To conclude, Enterprise Java developers will need to understand their database and cooperate with their DBA to get maximum performance out of their application!
Thanks to Barre Dijkstra and Wouter van Reeven for providing valuable input for this blog.