Erik Lumme.dev

Hibernate's table generator optimizers

Posted 03 Apr 2022 by Erik Lumme

Using a database table to keep track of the next identifiers to assign works in all SQL implementations, but comes at a performance cost. Here we take a look at what Hibernate does to alleviate this issue.

If you want the Java Persistence API (JPA) to generate identifiers for you when you insert a new entity into your database, there are three strategies to use: table, sequence, and identity. The table strategy is the only one whose implementation is database agnostic.

The table strategy works by using a separate database table to store the next identifier to use, with one column for the table name, and one for the identifier. When inserting a new entity, you simply increment the identifier for that table, read the new value, and use it for your entity.

Getting the next identifier to use by reading the current value, incrementing it, and writing it back.
Getting the next identifier to use by reading the current value, incrementing it, and writing it back.

The performance cost comes from the multiple queries that must be executed just to get the next identifier to use, and the fact that the row must be locked while it is read and incremented. Vlad Mihalcea expands on this further in his blog post.

Hibernate implements a set of optimizers to alleviate this performance cost. These optimizers work by retrieving the identifiers to assign in bulk, and keeping them in memory. Let’s first look at how it would work without optimization.

The standard case of incrementing and assigning one identifier at a time.
The standard case of incrementing and assigning one identifier at a time.

In this example, the sequence value starts at 7, so the next sequence value is 8. By incrementing the sequence value, we can use 8 as our identifier. Every time a new identifier is needed, the sequence value is read and incremented.

To use a TableGenerator in JPA, you must first define it, e.g. using the @TableGenerator annotation. Through it, you will define a name for your generator, so that you can reference it later. You can also define the name of your sequence table, the names of the columns in that table, the value to look for in that table, and an allocation size. You use it through the @GeneratedValue annotation.

When you define a TableGenerator in Hibernate, it uses the PooledOptimizer by default. It considers the next sequence value (current sequence value + 1) to be the upper bound of the pool of identifiers it is allowed to assign. The pool size is determined by the allocation size of the table generator, by default 50. Because of this, it also increments the sequence value by the allocation size.

All examples below use an allocation size of 3, and an initial sequence value of 7. For the pooled optimizer, the sequence value is incremend by the allocation size, here from 7 to 10.

The pooled optimizer interpets the next sequence value as the upper bound for its pool of identifiers.
The pooled optimizer interpets the next sequence value as the upper bound for its pool of identifiers.

The PooledOptimizer assigns identifiers that are smaller than the sequence value it read. This is different from our initial example, where sequence values smaller than or equal to the current are assumed to already have been assigned. Using both these methods together on the same sequence table row will lead to identifier clashes.

The PooledLoOptimizer works in a very similar manner, except it considers the next sequence value to be the lower bound of its pool of identifiers.

The pooled-lo optimizer interprets the next sequence value as the lower bound for its pool of identifiers.
The pooled-lo optimizer interprets the next sequence value as the lower bound for its pool of identifiers.

The HiLoOptimizer only increments the sequence value by 1 at a time. However, it considers each sequence value as representing a pool of identifiers with the upper bound being (next sequence value ⋅ allocation size). If the next sequence value is 1, the upper bound would in our case be 1 ⋅ 3 = 3, and cover the identifiers 1, 2, and 3. If the next sequence value is 2, the upper bound would be 2 ⋅ 3 = 6, and cover the identifiers 4, 5, and 6.

The hi-lo optimizer interprets each sequence value as representing a pool of identifiers with an upper bound of (next sequence value ⋅ allocation size). If the next sequence value is 8, the upper bound is (8 ⋅ 3 = 24) given an allocation size of 3.
The hi-lo optimizer interprets each sequence value as representing a pool of identifiers with an upper bound of (next sequence value ⋅ allocation size). If the next sequence value is 8, the upper bound is (8 ⋅ 3 = 24) given an allocation size of 3.

There is also a LegacyHiLoAlgorithmOptimizer that for legacy reasons works in mysterious ways.

The legacy hi-lo optimizer works in mysterious ways.
The legacy hi-lo optimizer works in mysterious ways.

Finally, the NoopOptimizer performs no optimization, and if the allocation size is 1, it would work as our first example. By default the allocation size is 50, causing it to skip values as it increments the sequence value by the allocation size.

The no-op optimizer performs no optimization, but still increases the sequence value by the allocation size.
The no-op optimizer performs no optimization, but still increases the sequence value by the allocation size.

There is a PooledLoThreadLocalOptimizer variant of the PooledLoOptimizer. For thread safety, all other optimizers read the next sequence value and initialize their pools of identifiers in a synchronized method. The PooledLoThreadLocalOptimizer has a separate pool of identifiers per thread, and can therefore avoid the use of synchronization and its possible performance costs.

Configuring the optimizer to use

The optimizer to use can be set through the hibernate.id.optimizer.pooled.preferred property. The Spring equivalent is spring.jpa.properties.hibernate.id.optimizer.pooled.preferred.

The possible values are none for NoopOptimizer, hilo for HiLoOptimizer, legacy-hilo for LegacyHiLoAlgorithmOptimizer, pooled for PooledOptimizer (default), pooled-lo for PooledLoOptimizer, and pooled-lotl for PooledLoThreadLocalOptimizer.

The end.