Sorting and pagination with Hibernate Criteria - How it can go wrong with joins

Jeroen van Wilgenburg

Lately I ran into an annoying problem with Hibernate. I tried to do pagination on a query result which was doing an SQL-JOIN under the hood. The query before paging returned about 100 results. When I turned on paging (with 20 results per page) all the pages had less than 20 results!
The reason for this is that with a JOIN there can be duplicate results and those results are filtered out after pagination is done. In this blog I will explain how to solve those problems and it also a cleaner way to build your Criteria queries.

Let me explain this with some examples.
Our data model for today consists of doors and colors. A door can have multiple colors.
The data set to start with:
Door A:
-id: 1
-size: large
-colors: red and green

Door B:
-id: 2
-size: small
-color: blue and green

Now lets select all red, green or blue doors, that will be two doors right?

Criteria crit = session.createCriteria(Door.class);
crit.createAlias("colors", "c");
crit.add(Restrictions.in("c.name", new Object[]{ "red" , "green" , "blue" }));

Because of the colors alias an INNER JOIN is used and the result might not be what you expect.
The size of the list is 4, 4 Door objects are returned! That is very strange. When you have a look at the SQL you will see the following query:
select this_.id as id2_1_, this_.size as size2_1_, colors3_.DOOR_ID as DOOR1_, c1_.id as COLOR2_, c1_.id as id3_0_, c1_.name as name3_0_
from doors this_ inner join DOORS_COLORS colors3_ on this_.id=colors3_.DOOR_ID inner join colors c1_ on colors3_.COLOR_ID=c1_.id
where c1_.name in (?, ?, ?)

Translated to readable SQL:
SELECT * FROM doors d
INNER JOIN doors_colors dc on d.id=dc.door_id
INNER JOIN colors c ON c.id=dc.color_id
WHERE c.name in (?, ?, ?)

This query will indeed return 4 results, but we're selecting Door objects, not SQL-rows. Looking at the java code I expected Door object, not query result rows wrapped in a Door object
To solve this problem you can add a ResultTransformer to the crit-object:

crit.setResultTransformer(CriteriaSpecification.DISTINCT_ROOT_ENTITY);

Now only two doors are selected, problem solved.
Well, not so fast, let's see what happens with many doors. Our database now has 16 doors. When we select the red, green and blue doors again, 11 results appears. Suppose we want to add paging to the results, because it's just too many doors to handle at a time. 5 doors per page is enough.
We can achieve it by adding the following two lines:

crit.setFirstResult(0);
crit.setMaxResults(5);

11 doors found, 5 results per page and display the results starting from the first result (which is 0-indexed).
How many results do you expect on the first page?
Not the 3 results returned probably. The ResultTransformer is still active by the way. What happens is that the paging is applied before the duplicate entities are filtered out.

When you start thinking in SQL you might come up with the idea to use a subquery. This is exactly what we're are going to do in Hibernate.
A subquery in pseudocode:
select * from doors d where d.id in (select id from doors, colors where colors.name in ('red','green','blue'))
As you can see the subquery only selects the id column and in my previous example a complete Door object was selected.
To select other columns than the columns in the Entity object you can use Projections. With multiple columns use the ProjectionList object. For now we need an id-projection, which has is own method:

crit.setProjection(Projections.id());

Now a list with int's is selected. This result is the input for the next query. You can create Criteria objects without a Hibernate Session for later use. These objects are called DetachedCriteria.

When we rename the crit to dc and create it as a DetachedCriteria you will have the following lines of code:

DetachedCriteria dc = DetachedCriteria.forClass(Door.class);
dc.createAlias( "colors" , "c");
dc.add(Restrictions.in( "c.name" , new Object[]{ "red" , "green" , "blue" }));
dc.setResultTransformer(CriteriaSpecification.DISTINCT_ROOT_ENTITY);
dc.setProjection(Projections.id());

The next step is creating an outer query around the previous query.

Criteria outer = session.createCriteria(Door.class);
outer.add(Subqueries.propertyIn("id", dc));
outer.setFirstResult(0);
outer.setMaxResults(5);

With Subqueries.propertyIn you can add the previous query. When you add paging and sorting to the new outer query object all your problems are solved. You can even get rid of the ResultTransformer and get the same result. When you're tuning performance it might be a nice test to see whether it makes any difference to do the DISTINCT before you pass the results to the outer query.

How you can solve DISTINCT problemns is explained reasonably in the Hibernate FAQ:
Hibernate does not return distinct results for a query with outer join fetching enabled for a collection (even if I use the distinct keyword)?

The lesson I learned with Hibernate is that you still have to know SQL and don't trust the results blindly. If you don't know why you have to use DISTINCT and how a JOIN presents its results you'll get lost very soon.

With the DetachedCriteria object you can split your queries in a very readable way and when using projections, the SQL under the hood only selects the needed columns, which improves performance.

sources

http://floledermann.blogspot.com/2007/10/solving-hibernate-criterias-distinct.html
http://www.hibernate.org/117.html#A12

Comments (20)

  1. Alex - Reply

    December 14, 2008 at 11:45 am

    I think it's kind of sad that you need to compromise performance (the joins will be much faster on for instance mysql) because Hibernate won't translate the SQL resultset to objects correctly when collections are involved.

    Hibernate should really have a configuration parameter which automatically does this for you.

  2. Maarten Winkels - Reply

    December 21, 2008 at 5:46 pm

    @Alex: It is a bit strange to blame Hibernate for the fact that MySQL cannot optimize the subselect in the same way as it does the inner join: The DB logic is in both cases exactly the same. Furthermore, it has nothing to do with how Hibernate 'translates the SQL resultset to objects' or how this is incorrect. The fact is that you cannot write an SQL statement which involves an (inner) join on a table that might have multiple entries for the base table with a LIMIT (for MySQL) to return a predictable number of unique rows from the base table. You cannot request Hibernate to solve this for you.

    What you could do is look into the query.scroll() method and do pagination yourself. This will be a bit trickier, because you will have to do the UNIQUE and the pagination yourself, but you'll be able to improve performance.

  3. Muzaffer - Reply

    December 26, 2008 at 12:58 pm

    Hi Jeroen van Wilgenburg,

    Thanks for your explanation. I am using Criteria obj as follows
    1. Set the order by [column1]
    2. Get the resultset
    3. Here I want to change order by to [column2]. If I simply addOrder [column2] to the Criteria obj it will give me result in the order [column1],[column2]. Here I want result to be sorted only based on [column2]. Thanks.

  4. Bhim Bomma - Reply

    January 7, 2009 at 1:56 am

    Thanks Jeroen. This helped me to solve the pagination problem in my application.

  5. AGO - Reply

    January 8, 2009 at 11:09 pm

    Thank you very much for such nice article!
    It is short, with good example and without unnecessary fireworks - simply great.
    Regards, Albert

  6. ctrlspace » Something to think about - Reply

    April 20, 2009 at 6:46 pm

    [...] did not realize that multiple joins are not possible using Hibernate’s criteria api.  This article is not related but solves a pagination problem I’ve had in the [...]

  7. murali - Reply

    June 22, 2009 at 3:34 pm

    Good Article , Just one note : this approach would not work if the entities has EAGER Join on Collections

  8. ptb - Reply

    June 26, 2009 at 2:53 pm

    I tried this solution but I couldn't get it to work in cases where you need to add multiple properties to the projection list. Apparently Sql doesn't allow multiple select items in a subselect.

  9. Filme Noi Cinema - Reply

    November 30, 2009 at 3:10 pm

    Actually SQL allows multiple select items in subselect (or at least Oracle SQL does) like this:

    select * from person where (name, age) in (select name, age from ....... )

    But I guess that Hibernate hasn't implemented this yet.

  10. Markos Fragkakis - Reply

    November 24, 2010 at 4:26 pm

    Thanks for this, very well explained.

  11. Tim Dennison - Reply

    September 16, 2011 at 3:48 pm

    Great explanation of this problem!

    I keep thinking about an alternative solution and wanted feedback. I created a custom ResultTransformer that delegates to the DistinctRootEntityResultTransformer for transformTuple() and transformList() operations. However, the transformList() operation also uses a couple of instance variables (number of objects desired, page index) to produce a "paged" List of objects to return. The effect is that paging is accomplished via "entities" instead of rows.

  12. Hibernate – distinct results with pagination - Reply

    February 23, 2012 at 8:53 pm

    [...] This seems to be a well known problem for years as can be read here: http://blog.xebia.com/2008/12/11/sorting-and-pagination-with-hibernate-criteria-how-it-can-go-wrong-... [...]

  13. Cyril Mathew - Reply

    October 3, 2012 at 1:20 pm

    This is a great article which solved my pagination problem

  14. Breton - Reply

    February 5, 2013 at 2:42 am

    That may save some time. I ll follow your explanations. Thks

  15. umarani - Reply

    April 2, 2013 at 7:18 am

    I want to display 10 records

  16. umarani - Reply

    April 2, 2013 at 7:18 am

    I want to display ten records per page using critria query how to set pagenum,pagesize using that....thanks in advance

  17. fdm - Reply

    June 12, 2013 at 11:33 am

    nice,
    it does not solve the problem for sorting.
    how would it be done if yoiu want to sort the doors by their color?
    Subqueries do not support sorting.

  18. fdm - Reply

    June 12, 2013 at 11:35 am

    Sorry... the 'nice' do not sounds quite correct on my previus message.
    It is in complete ... it was meant to say: 'Nice article, but it does not solve the problem for sorting'

    thanks for the article.

  19. Ruba - Reply

    October 9, 2013 at 4:54 pm

    Thanks a lot! you rescued me! thanks for the great tips.

  20. Aaron - Reply

    December 16, 2013 at 10:40 pm

    Thank you! Ran into this exact issue and your blog perfectly described the problem and workaround, hope others find this as helpful as I did.

    -Aaron

Add a Comment