Lately I ran into an annoying problem with Hibernate. I tried to do pagination on a query result which was doing an SQL-JOIN under the hood. The query before paging returned about 100 results. When I turned on paging (with 20 results per page) all the pages had less than 20 results!
The reason for this is that with a JOIN there can be duplicate results and those results are filtered out after pagination is done. In this blog I will explain how to solve those problems and it also a cleaner way to build your Criteria queries.

Let me explain this with some examples.
Our data model for today consists of doors and colors. A door can have multiple colors.
The data set to start with:
Door A:
-id: 1
-size: large
-colors: red and green

Door B:
-id: 2
-size: small
-color: blue and green

Now lets select all red, green or blue doors, that will be two doors right?

Criteria crit = session.createCriteria(Door.class);
crit.createAlias("colors", "c");
crit.add(Restrictions.in("c.name", new Object[]{ "red" , "green" , "blue" }));

Because of the colors alias an INNER JOIN is used and the result might not be what you expect.
The size of the list is 4, 4 Door objects are returned! That is very strange. When you have a look at the SQL you will see the following query:
select this_.id as id2_1_, this_.size as size2_1_, colors3_.DOOR_ID as DOOR1_, c1_.id as COLOR2_, c1_.id as id3_0_, c1_.name as name3_0_
from doors this_ inner join DOORS_COLORS colors3_ on this_.id=colors3_.DOOR_ID inner join colors c1_ on colors3_.COLOR_ID=c1_.id
where c1_.name in (?, ?, ?)

Translated to readable SQL:
SELECT * FROM doors d
INNER JOIN doors_colors dc on d.id=dc.door_id
INNER JOIN colors c ON c.id=dc.color_id
WHERE c.name in (?, ?, ?)

This query will indeed return 4 results, but we're selecting Door objects, not SQL-rows. Looking at the java code I expected Door object, not query result rows wrapped in a Door object
To solve this problem you can add a ResultTransformer to the crit-object:

crit.setResultTransformer(CriteriaSpecification.DISTINCT_ROOT_ENTITY);

Now only two doors are selected, problem solved.
Well, not so fast, let's see what happens with many doors. Our database now has 16 doors. When we select the red, green and blue doors again, 11 results appears. Suppose we want to add paging to the results, because it's just too many doors to handle at a time. 5 doors per page is enough.
We can achieve it by adding the following two lines:

crit.setFirstResult(0);
crit.setMaxResults(5);

11 doors found, 5 results per page and display the results starting from the first result (which is 0-indexed).
How many results do you expect on the first page?
Not the 3 results returned probably. The ResultTransformer is still active by the way. What happens is that the paging is applied before the duplicate entities are filtered out.

When you start thinking in SQL you might come up with the idea to use a subquery. This is exactly what we're are going to do in Hibernate.
A subquery in pseudocode:
select * from doors d where d.id in (select id from doors, colors where colors.name in ('red','green','blue'))
As you can see the subquery only selects the id column and in my previous example a complete Door object was selected.
To select other columns than the columns in the Entity object you can use Projections. With multiple columns use the ProjectionList object. For now we need an id-projection, which has is own method:

crit.setProjection(Projections.id());

Now a list with int's is selected. This result is the input for the next query. You can create Criteria objects without a Hibernate Session for later use. These objects are called DetachedCriteria.

When we rename the crit to dc and create it as a DetachedCriteria you will have the following lines of code:

DetachedCriteria dc = DetachedCriteria.forClass(Door.class);
dc.createAlias( "colors" , "c");
dc.add(Restrictions.in( "c.name" , new Object[]{ "red" , "green" , "blue" }));
dc.setResultTransformer(CriteriaSpecification.DISTINCT_ROOT_ENTITY);
dc.setProjection(Projections.id());

The next step is creating an outer query around the previous query.

Criteria outer = session.createCriteria(Door.class);
outer.add(Subqueries.propertyIn("id", dc));
outer.setFirstResult(0);
outer.setMaxResults(5);

With Subqueries.propertyIn you can add the previous query. When you add paging and sorting to the new outer query object all your problems are solved. You can even get rid of the ResultTransformer and get the same result. When you're tuning performance it might be a nice test to see whether it makes any difference to do the DISTINCT before you pass the results to the outer query.

How you can solve DISTINCT problemns is explained reasonably in the Hibernate FAQ:
Hibernate does not return distinct results for a query with outer join fetching enabled for a collection (even if I use the distinct keyword)?

The lesson I learned with Hibernate is that you still have to know SQL and don't trust the results blindly. If you don't know why you have to use DISTINCT and how a JOIN presents its results you'll get lost very soon.

With the DetachedCriteria object you can split your queries in a very readable way and when using projections, the SQL under the hood only selects the needed columns, which improves performance.

sources

http://floledermann.blogspot.com/2007/10/solving-hibernate-criterias-distinct.html
http://www.hibernate.org/117.html#A12