Why is Hibernate splitting my batch insert into 3 queries

Question

I'm currently tring to implement a batch insert using Hibernate. Here are the few things I implemented:

1. Entity

@Entity
@Table(name = "my_bean_table")
@Data
public class MyBean {

    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "seqGen")
    @SequenceGenerator(name = "seqGen", sequenceName = "bean_c_seq", allocationSize=50)
    @Column(name = "my_bean_id")
    private Long id;

    @Column(name = "my_bean_name")
    private String name;

    @Column(name = "my_bean_age")
    private int age;

    public MyBean(String name, int age) {
        this.name = name;
        this.age = age;
    }
}

2.application.properties

Hibernate and the datasource are configured this way:

spring.datasource.url=jdbc:postgresql://{ip}:{port}/${db}?reWriteBatchedInserts=true&loggerLevel=TRACE&loggerFile=pgjdbc.log
spring.jpa.show-sql=truespring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true

NB: &loggerLevel=TRACE&loggerFile=pgjdbc.log is for debugging purpose

3. Elements in my PostgresSQL Database

CREATE TABLE my_bean_table
(
    my_bean_id bigint NOT NULL DEFAULT nextval('my_bean_seq'::regclass),
    my_bean_name "char(100)" NOT NULL,
    my_bean_age smallint NOT NULL,
    CONSTRAINT bean_c_table_pkey PRIMARY KEY (bean_c_id)
)

CREATE SEQUENCE my_bean_seq
    INCREMENT 50
    START 1
    MINVALUE 1
    MAXVALUE 9223372036854775807
    CACHE 1;

EDIT: Added ItemWriter

public class MyBeanWriter implements ItemWriter<MyBean> {

    private Logger logger = LoggerFactory.getLogger(MyBeanWriter .class);

    @Autowired
    MyBeanRepository repository;

    @Override
    public void write(List<? extends BeanFluxC> items) throws Exception {
        repository.saveAll(items);
    }

}

commit-interval is set to 50 as well.

In the log file provided by the jdbc driver I get the following lines:

avr. 10, 2020 7:26:48 PM org.postgresql.core.v3.QueryExecutorImpl execute
FINEST:   batch execute 3 queries, handler=org.postgresql.jdbc.BatchResultHandler@1317ac2c, maxRows=0, fetchSize=0, flags=5
avr. 10, 2020 7:26:48 PM org.postgresql.core.v3.QueryExecutorImpl sendParse
FINEST:  FE=> Parse(stmt=null,query="insert into my_bean_table (my_bean_age, my_bean_name, my_bean_id) values ($1, $2, $3),($4, $5, $6),($7, $8, $9),($10, $11, $12),($13, $14, $15),($16, $17, $18),($19, $20, $21),($22, $23, $24),($25, $26, $27),($28, $29, $30),($31, $32, $33),($34, $35, $36),($37, $38, $39),($40, $41, $42),($43, $44, $45),($46, $47, $48),($49, $50, $51),($52, $53, $54),($55, $56, $57),($58, $59, $60),($61, $62, $63),($64, $65, $66),($67, $68, $69),($70, $71, $72),($73, $74, $75),($76, $77, $78),($79, $80, $81),($82, $83, $84),($85, $86, $87),($88, $89, $90),($91, $92, $93),($94, $95, $96)",oids={23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20})
...
FINEST:  FE=> Execute(portal=null,limit=1)
avr. 10, 2020 7:26:48 PM org.postgresql.core.v3.QueryExecutorImpl sendParse
FINEST:  FE=> Parse(stmt=null,query="insert into my_bean_table (my_bean_age, my_bean_name, my_bean_id) values ($1, $2, $3),($4, $5, $6),($7, $8, $9),($10, $11, $12),($13, $14, $15),($16, $17, $18),($19, $20, $21),($22, $23, $24),($25, $26, $27),($28, $29, $30),($31, $32, $33),($34, $35, $36),($37, $38, $39),($40, $41, $42),($43, $44, $45),($46, $47, $48)",oids={23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20})
...
avr. 10, 2020 7:26:48 PM org.postgresql.core.v3.QueryExecutorImpl sendParse
FINEST:  FE=> Parse(stmt=null,query="insert into my_bean_table (my_bean_age, my_bean_name, my_bean_id) values ($1, $2, $3),($4, $5, $6)",oids={23,1043,20,23,1043,20})

Here is my question: why is the batch query splitted into 3 queries:

first query: 32 elements
second query: 16 elements
third query: 2 elements

NB: I tried to set the batch size to 100 and 200, and I still got 3 different queries.

Answer 1

Found while debugging the PgPreparedStatement class and its transformQueriesAndParameters() method:

 @Override
  protected void transformQueriesAndParameters() throws SQLException {
    ...
    BatchedQuery originalQuery = (BatchedQuery) preparedQuery.query;
    // Single query cannot have more than {@link Short#MAX_VALUE} binds, thus
    // the number of multi-values blocks should be capped.
    // Typically, it does not make much sense to batch more than 128 rows: performance
    // does not improve much after updating 128 statements with 1 multi-valued one, thus
    // we cap maximum batch size and split there.
    ...
    final int highestBlockCount = 128;
    final int maxValueBlocks = bindCount == 0 ? 1024 /* if no binds, use 1024 rows */
        : Integer.highestOneBit( // deriveForMultiBatch supports powers of two only
            Math.min(Math.max(1, (Short.MAX_VALUE - 1) / bindCount), highestBlockCount));
}

a single query for a batch insert can only contain at max 128 elements
other number of rows will be power of two

I'm now using 128 as sequence increment in the database and as batch-size parameter on the client side, it works like a charm.

Answer 2

I don't have a conclusive answer, but this behaviour seems very similar and probably for the same reason as that of batch fetching.

It uses different statements with the number of parameter sets equal to powers of two. This is to minimize the number of different statements executed. Databases need to parse statements and use caches to hold parsed statements. If a client executes tons of statements that are doing essentially the same thing but differ in number of parameter sets this would render the cache useless.

On the other hand I haven't seen it with batch inserts but only with bulk fetch operations. I have a couple of guesses why this might be happening:

Your ids get generated by the database, so before the data can be written to the database ids need to get queried from the database sequence. Maybe the select behaviour than leaks through to the inserts
It could be a an optimisation done by the JDBC driver that is rewriting this kind auf statements.
Hibernate does that all the time and I just missed that. Although I think it is weird to do it when the number of parameter sets is equal to the batch size.

Why is Hibernate splitting my batch insert into 3 queries

Question

2 answers

solution1
2 ACCPTED 2020-04-11 09:22:00

solution2
0 2020-04-11 06:07:07

Why is Hibernate splitting my batch insert into 3 queries

Question

2 answers

solution1 2 ACCPTED 2020-04-11 09:22:00

solution2 0 2020-04-11 06:07:07

solution1
2 ACCPTED 2020-04-11 09:22:00

solution2
0 2020-04-11 06:07:07