JPA implementation patterns: Mapping inheritance hierarchies

Vincent Partington

Last week I discussed the relative merits of field access versus property access in the ongoing JPA implementation patterns blog series. This week I will dwell on the choices offered when mapping inheritance hierarchies in JPA.

JPA provides three ways to map Java inheritance hierarchies to database tables:

  1. InheritanceType.SINGLE_TABLE - The whole inheritance hierarchy is mapped to one table. An object is stored in exactly one row in that table and the discriminator value stored in the discriminator column specifies the type of the object. Any fields not used in a superclass or a different branch of the hierarchy are set to NULL. This is the default inheritance mapping strategy used by JPA.
  2. InheritanceType.TABLE_PER_CLASS - Every concrete entity class in the hierarchy is mapped to a separate table. An object is stored in exactly one row in the specific table for its type. That specific table contains column for all the fields of the concrete class, including any inherited fields. This means that siblings in an inheritance hierarchy will each have their own copy of the fields they inherit from their superclass. A UNION of the separate tables is performed when querying on the superclass.
  3. InheritanceType.JOINED - Every class in the hierarchy is represented as a separate table, causing no field duplication to occur. An object is stored spread out over multiple tables; one row in each of the tables that make up its class inheritance hierarchy. The is-a relation between a subclass and its superclass is represented as a foreign key relation from the "subtable" to the "supertable" and the mapped tables are JOINed to load all the fields of an entity.

A nice comparison of the JPA inheritance mapping options with pictures, and including a description of the @MappedSuperclass option, can be found in the DataNucleus documentation.

Now the interesting question is: which method works best in what circumstances?

SINGLE_TABLE - Single table per class hierarchy

The SINGLE_TABLE strategy has the advantage of being simple. Loading entities requires querying only one table, with the discriminator column being used to determine the type of the entity. This simplicity also helps when manually inspecting or modifying the entities stored in the database.

A disadvantage of this strategy is that the single table becomes very large when there are a lot of classes in the hierarchy. Also, columns that are mapped to a subclass in the hierarchy should be nullable, which is especially annoying with large inheritance hierarchies. Finally, a change to any one class in the hierarchy requires the single table to be altered, making the SINGLE_TABLE strategy only suitable for small inheritance hierarchies.

TABLE_PER_CLASS - Table per concrete class

The TABLE_PER_CLASS strategy does not require columns to be made nullable, and results in a database schema that is relatively simple to understand. As a result it is also easy to inspect or modify manually.

A downside is that polymorphically loading entities requires a UNION of all the mapped tables, which may impact performance. Finally, the duplication of column corresponding to superclass fields causes the database design to not be normalized. This makes it hard to perform aggregate (SQL) queries on the duplicated columns. As such this strategy is best suited to wide, but not deep, inheritance hierarchies in which the superclass fields are not ones you want to query on.

JOINED - Table per class

The JOINED strategy gives you a nicely normalized database schema without any duplicate columns or unwanted nullable columns. As such it is best suited to large inheritance hierarchies, be the deep or wide.

This strategy does make the data harder to inspect or modify manually. Also, the JOIN operation needed to load entities can become a performance problem or a downright barrier to the size of your inheritance strategy. Also note that Hibernate does not correctly handle discriminator columns when using the JOINED strategy.

BTW, when using Hibernate proxies, be aware that lazily loading a class mapped with any of the three strategies above always returns a proxy that is an instanceof the superclass.

Are those all the options?

So to summarize you could say the following rules apply when choosing from JPA's standard inheritance mapping options:

  • Small inheritance hierarchy -> SINGLE_TABLE.
  • Wide inheritance hierarchy -> TABLE_PER_CLASS.
  • Deep inheritance hierarchy -> JOINED.

But what if your inheritance hierarchy is very wide or very deep? And what if the classes in your system are modified often? As we found while building a persisted command framework and a flexible CMDB for our Java EE deployment automation product Deployit, the concrete classes at the bottom of a large inheritance hierarchy can change often. So these two questions often get a positive answer at the same time. Luckily there is one solution to both problems!

Using blobs

The first thing to note is that inheritance is a very large component of the object-relational impedance mismatch. And then question we should ask ourselves is: why are we even mapping all those often changing concrete classes to database tables? If object databases had really broken through, we might be better off storing those classes in such a database. As it is, relational database have inherited the earth so that is out of the question. It might also be that for a part of your object model the relational model actually makes sense because you want to perform queries and have the database manage the (foreign key) relations. But for some parts you are actually only interested in simple persistence of objects.

A nice example is the "persisted command framework" I mentioned above. The framework needs to store generic information about each command such as a reference to the "change plan" (a kind of execution context) it belongs to, start and end times, log output, etc. But it also needs to store a command object that represents the actual work to be done (an invocation of wsadmin or wlst or something similar in our case).

For the first part the hierarchical model is best suited. For the second part simple serialization will do. So we first define a simple interface that is implemented by the different command objects in our system:

public interface Command {
	void execute();
}

And then we create the entity that stores both the metadata (the data we want to store in a relational model) and the serialized command object:

@Entity
public class CommandMetaData {
	@Id
	@GeneratedValue(strategy = GenerationType.AUTO)
	private int id;

	@ManyToOne
	private ChangePlan changePlan;

	private Date startOfExecution;

	private Date endOfExecution;

	@Lob
	private String log;

	@Lob
	@Column(name = "COMMAND", updatable = false)
	private byte[] serializedCommand;

	@Transient
	private Command command;

	public CommandMetaData(Command details) {
		serializedCommand = serializeCommand(details);
	}

	public Command getCommand() {
		if (command != null) {
			command = deserializeCommand(serializedCommand);
		}
		return command;
	}

	[... rest omitted ...]
}

The serializedCommand field is a byte array that is stored as a blob in the database because of the @Lob annotation. The column name is explicitly set to "COMMAND" to prevent the default column name of "SERIALIZEDCOMMAND" from appearing in the database schema.

The command field is marked as @Transient to prevent it from being stored in the database.

When a CommandMetaData object is created, a Command object is passed in. The constructor serializes the command object and stores the results in the serializedCommand field. After that the command cannot be changed (there is no setCommand() method), so the serializedCommand can be marked as not updatable. This prevents that pretty big blob field from being written to the database every time another field of the CommandMetaData (such as the log field) is updated.

Every time the getCommand method is invoked, the command is deserialized if needed and then it is returned. The getCommand could be marked synchronized if this object were used in multiple concurrent threads.

Some things to note about this approach are:

  • The serialization method used influences the flexibility of this approach. Standard Java serialization is simple but does not handle changing classes well. XML can be an alternative but that brings its own versioning problems. Picking the right serialization mechanism is left as an exercise for the reader.
  • Although blobs have been around for a while, some databases still struggle with them. For example, using blobs with Hibernate and Oracle can be tricky.
  • In the approach presented above, any changes made to the Command object after it has been serialized will not be stored. Clever use of the @PrePersist and @PreUpdate lifecycle hooks could solve this problem.

This semi-object database/semi-relational database approach to persistence worked out quite well for us. I am interested to hear whether other people have tried the same approach and how they fared. Or did you think of another solution to these problems?

For a list of all the JPA implementation pattern blogs, please refer to the JPA implementation patterns wrap-up.

Comments (5)

  1. Andrew Phillips - Reply

    June 21, 2009 at 9:23 pm

    Is it desirable to have to the serializedCommand byte array as a member of your entity class? The transient command object would appear to be the one that is the "natural" member of your domain model - the fact that it happens to be persisted as a byte array is purely an implementation detail, and probably not something you would want to leak into your API in this way.

    Would a Hibernate user type not be more appropriate in this case - I think there might even be a "Serializable2LobUserType" out there already.

  2. Maarten Winkels - Reply

    June 22, 2009 at 3:00 am

    @Andrew: The default for JPA is to store a complex, non-entity property in a BLOB. It uses standard Java Serialization of course. Due to its limitations Vincent mentions in the blog, the serialization/deserialization is performed in the Entity class, I suppose.

    +1 for using a UserType for this kind of transformation. But I think UserTypes are Hibernate specific and not JPA-ish.

  3. Vincent Partington - Reply

    June 22, 2009 at 8:27 am

    @Andrew, @Maarten: As Maarten already guessed I perform the serialization/deserialization in the entity class to have more control over it. This way I can use different serialization mechanisms, set the blob field to be updatable, catch and handle ClassNotFoundExceptions and other serialization errors. Using a Hibernate UserType would have been nicer but does not work for all JPA providers.

    The interesting thing with this and with other JPA implementation patterns (bidirectional associations, using @PreRemove, bidirectional associations vs. lazy loading, using UUIDs as primary keys) is that they all "pollute" the entity class with their "special JPA sauce". It would be nice if one could define an external "persistence handler" that takes care of all this nasty stuff. Something like a UserType but with more abilities. Instead of the @Pre/@Post lifecycle hooks with only invite you to do this kind of nasty stuff.

    I guess this matches nicely with my comment at the beginning of the series that the JPA abstraction, as it is today, is pretty leaky.

  4. Simon Massey - Reply

    September 18, 2009 at 12:56 am

    We use the "clob/blob overflow column" pattern extensively in some framework code. In our case it is an XML serialization of the subclass fields. The proprietary framework in question is a few years old now so it is hibernate not jpa with interceptors not @pre/@Post hooks but doing exactly the same thing.

    The framework allows for tens of thousands of custom types going into the same smallish set of entity tables. Our business users design new types on the fly from within the application and we use a custom class load to generate all the pojo classes at runtime on first use. So it has mega wide and mega deep hierarchies all generated at runtime.

    The generated subclass types have unmapped properties fields with setters and getters. Within hibernate interceptor life cycle hooks we pack those unmapped subclass fields into the mapped xml base class blob to dehydrate the object to store it, then on rehydration unpack them from the xml to activate it. Once again you could do that with @pre/@post handlers.

    It has been a few years since the first version of our framework. Your suggestion here on this blog is the first time I have head of anyone else doing the same thing.

  5. Vincent Partington - Reply

    September 18, 2009 at 11:10 am

    @Simon Massey: Using the @Pre/@Post handlers is an even neater way to solve this than doing it in the getters/setters as I propose. Basically, you actually implemented the "Clever use of the @PrePersist and @PreUpdate lifecycle hooks" I mention. ;-)

Add a Comment