I'll keep you in suspense no longer.
It's time for numero uno of the EJAPP Top 10 countdown!
Somewhat unexpectedly for an Enterprise Java Application Performance Problems Top 10, the #1 issue is the incorrect usage of databases.
In a lot of Enterprise Java projects, the database is treated as a simple bit bucket and the DBA as just the guy that executes the SQL statements on the production machines. This disdain for the database and its experts leads to badly performing applications such as:
Instead of pointing the finger at one another, Java developers and DBA's should work together in all stages of a project to make sure the database and the Java application server deliver the performance needed by the application.
Design
Implementation
Deployment
To conclude, Enterprise Java developers will need to understand their database and cooperate with their DBA to get maximum performance out of their application!
Thanks to Barre Dijkstra and Wouter van Reeven for providing valuable input for this blog.
Filed under Java, Oracle, Performance | 3 Comments »
The point about “Changing the order of the tables in the FROM and WHERE clauses” for oracle 9i can also apply to 8i and 10g when the rule-based optimizer (RBO) (RULE mode) is active. In general the point doesn’t apply when the cost-base optimizer (CBO) is active.
This gets back to 2 of your other points: 1) configure your DB properly; and 2) understand the query plans that are being used.
There’s a LOT of outdated or just plain wrong information available on the internet. I’ve learned the hard way regarding Oracle. The bottom line is to verify that the advice you follow actually does improve performance for your app. Bonus points for understanding why the tip worked for you.
-james.
Our dba took exception to this same point about the ordering of tables:
“Oracle 9i executes queries a LOT faster when the tables in both the FROM and the WHERE clauses are ordered from large to small.”
Well, it depends what he means by this, but this is pretty much wrong. Let’s say that I’m joining parent table A to a couple of it’s child lookup tables:
select *
from A
inner join B on A.b=B.id
inner join C on A.c=C.id
inner join D on A.d=D.id
In this example, it doesn’t matter *at all* what order I list the 3 joins in. Since we’re inner joining, the contents of B, C, and D each act as filters on A; we also want to do as little IO as possible to get the correct answer as quickly as possible. Thus, the fastest way to do this operation (imagine you had to do it by hand in Excel, if that helps) is to start with the smallest child table, use it’s index on “id” (hopefully) to quickly match a row in A to a value in the child table, and then remove everything that doesn’t match from our buffered copy of A[1]. That means that when we then compare to the remaining 2 child tables, we’ve got the smallest possible set of rows to join to the child’s index. We repeat the procedure by picking the smaller (ie, the better filter) of the 2 remaining child tables first, and we do the “loosest” join last.
That’s a simple example of finding an optimum execution plan. In order to find it, the 2 pieces of information I needed were (1) the sizes of the tables involved, and (2) whether or not the join columns were indexed. If you wanted to be more precise while trying to decide which join-filter to apply first, you might also take into consideration things like how easy index was to use (if it’s got a lot of columns you don’t need in it, the IO in the index can be significant) and the cardinality (selectiveness) of your join condition in the child table, if that condition isn’t unique.
These are all things the optimizer looks at, guaranteed. Sizes are gathered by statistical sampling; you can see those stats in the system view all_tables, for example. Indexes are obviously in the data dictionary as well, and the optimizer has stats to refer to for those that tell it about cardinality and fragmentation.
–Can you explain more what you mean?
AFAIK, the only case where join order matters in something like this is when size and cardinality are the same, which is really an edge case…
Hi Ed,
This remark originated from me. In your example, let’s say we would like to execute a query like this
select * from A, B, C, D where A.b = B.id and B.c = C.id and A.d = D.id
Please note that A has a foreign key to both B and D. I noticed in several cases that the query execution time decreases a lot if size(A) > size(B) > size(D). If e.g. size(A) > size(D) > size(B) the query should be
select * from A, D, B, C where A.d = D.id and A.b = B.id and B.c = C.id
I do agree that this is only the case when one table has foreign keys to more than one other table and if one of those other tables contains much more rows than the others.
Greets, Wouter van Reeven