简体   繁体   English

JDBC批处理操作的理解

[英]JDBC batch operations understanding

I use Hibernate ORM and PostgreSQL in my application, and sometimes i use batch operations. 我在我的应用程序中使用Hibernate ORM和PostgreSQL,有时我使用批处理操作。 And at first I didn't understand why in the logs with size of the batch = 25, 25 queries are generated, and at first thought that it does not work correctly. 起初我不明白为什么在批处理大小为25的日志中,会生成25个查询,并且最初认为它无法正常工作。 But after that I looked at the source code of the pg driver and found the following lines in the PgStatement class: 但之后我查看了pg驱动程序的源代码,并在PgStatement类中找到以下行:

 public int[] executeBatch() throws SQLException {
        this.checkClosed();
        this.closeForNextExecution();
        if (this.batchStatements != null && !this.batchStatements.isEmpty()) {
            this.transformQueriesAndParameters();
//confuses next line, because we have array of identical queries
            Query[] queries = (Query[])this.batchStatements.toArray(new Query[0]);
            ParameterList[] parameterLists = 
(ParameterList[])this.batchParameters.toArray(new ParameterList[0]); 
            this.batchStatements.clear();
            this.batchParameters.clear();

and in PgPreparedStatement class 并在PgPreparedStatement类中

    public void addBatch() throws SQLException {
        checkClosed();
        if (batchStatements == null) {
          batchStatements = new ArrayList<Query>();
          batchParameters = new ArrayList<ParameterList>();
        }

        batchParameters.add(preparedParameters.copy());
        Query query = preparedQuery.query;
    //confuses next line
        if (!(query instanceof BatchedQuery) || batchStatements.isEmpty()) {
          batchStatements.add(query);
        }
      }

I noticed that it turns out that if the size of the batch goes 25, 25 queries are sent with the parameters attached to them. 我注意到,如果批处理的大小为25,则会发送25个查询并附加参数。

Logs of the database confirm this, for example: 数据库的日志确认了这一点,例如:

2017-12-06 01:22:08.023 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_3: BEGIN
2017-12-06 01:22:08.024 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_4: select nextval ('tests_id_seq')
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_2: insert into tests (name, id) values ($1, $2)     
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory ПОДРОБНОСТИ:  параметры: $1 = 'test', $2 = '1'
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_2: insert into tests (name, id) values ($1, $2)
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory ПОДРОБНОСТИ:  параметры: $1 = 'test', $2 = '2'
...
x23 queries with parameters 
...
2017-12-06 01:22:08.063 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_5: COMMIT

But i thought one query must be executed with an array of 25 parameters. 但我认为必须使用25个参数的数组执行一个查询。 Or I don't understand how batch inserts work with a prepared statement? 或者我不明白批量插入如何与预准备语句一起使用? Why duplicate one query n times? 为什么要重复一次查询n次?

After all, i tried to debug my queries on this place 毕竟,我试图在这个地方调试我的查询

if (!(query instanceof BatchedQuery) || batchStatements.isEmpty()) {

and noticed that my queries are always instance of SimpleQuery instead of BatchedQuery. 并注意到我的查询始终是SimpleQuery的实例而不是BatchedQuery。 Maybe this is the solution to the problem? 也许这是问题的解决方案? Information about BatchedQuery i couldn't find 有关BatchedQuery的信息我找不到

There might be various kinds of batching involved, and I would cover PostgreSQL JDBC driver (pgjdbc) part of it. 可能涉及各种类型的批处理,我将介绍PostgreSQL JDBC驱动程序(pgjdbc)的一部分。

TL;DR: pgjdbc does use less network roundrips in case batch API is used. TL; DR:在使用批处理API的情况下,pgjdbc确实使用较少的网络回合。 BatchedQuery is used only if reWriteBatchedInserts=true is passed to the pgjdbc connection settings. BatchedQuery如果只是用来reWriteBatchedInserts=true传递给pgjdbc连接设置。

You might find https://www.slideshare.net/VladimirSitnikv/postgresql-and-jdbc-striving-for-high-performance relevant (slide 44,...) 您可能会发现https://www.slideshare.net/VladimirSitnikv/postgresql-and-jdbc-striving-for-high-performance相关(幻灯片44,...)

When it comes to query execution, network latency is often a significant part of the elapsed time. 在查询执行方面,网络延迟通常是经过时间的重要部分。

Suppose the case is to insert 10 rows. 假设案例是插入10行。

  1. No batching (eg just PreparedStatement#execute in a loop). 没有批处理(例如,只是PreparedStatement#execute在循环中PreparedStatement#execute )。 The driver would perform the following 驱动程序将执行以下操作

     execute query sync <-- wait for the response from the DB execute query sync <-- wait for the response from the DB execute query sync <-- wait for the response from the DB ... 

    Notable time would be spent in the "waiting for the DB" 值得注意的时间将花在“等待数据库”上

  2. JDBC batch API. JDBC批处理API。 That is PreparedStatement#addBatch() enables driver to send multiple "query executions" in a single network roundtrip. 这就是PreparedStatement#addBatch()使驱动程序能够在单个网络往返中发送多个“查询执行”。 Current implementation, however would still split large batches into smaller ones to avoid TCP deadlock. 然而,当前的实现仍然会将大批量分成较小的批次以避免TCP死锁。

    The actions would be much better: 行动会好得多:

     execute query ... execute query execute query execute query sync <-- wait for the response from the DB 
  3. Note, that even with #addBatch , there's overhead of "execute query" commands. 请注意,即使使用#addBatch ,也会出现“执行查询”命令的开销。 It does take server notable time to process each message individually. 服务器需要花费大量时间来单独处理每条消息。

    One of the ways to reduce the number of queries is to use multi-values insert. 减少查询数量的方法之一是使用多值插入。 For instance: 例如:

     insert into tab(a,b,c) values (?,?,?), (?,?,?), ..., (?,?,?) 

    This PostgreSQL enables to insert multiple rows at once. 这个PostgreSQL允许一次插入多行。 The drawback is you don't have detailed (per-row) error message. 缺点是您没有详细的(每行)错误消息。 Currently Hibernate does not implement multi-values insert. 目前,Hibernate没有实现多值插入。

    However pgjdbc can rewrite regular batch inserts into multi-values on the fly since 9.4.1209 (2016-07-15). 但是,自9.4.1209(2016-07-15)起,pgjdbc可以动态地将常规批量插入重写为多值。

    In order to activate multi-values rewrite, you need to add reWriteBatchedInserts=true connection property. 要激活多值重写,需要添加reWriteBatchedInserts=true连接属性。 The feature was initially developed in https://github.com/pgjdbc/pgjdbc/pull/491 该功能最初是在https://github.com/pgjdbc/pgjdbc/pull/491中开发的

    It is smart enough to use 2 statements in order to insert 10 rows. 它足够聪明,可以使用2个语句来插入10行。 The first one is 8-valued statement, and the second one is 2-valued statement. 第一个是8值语句,第二个是2值语句。 Usage of powers of two enables pgjdbc to keep the number of distinct statements sane, and that improves performance as often-used statements are server-prepared (see What's the life span of a PostgreSQL server-side prepared statement ) 使用2的幂使pgjdbc能够保持不同语句的数量合理,并且这可以提高性能,因为经常使用的语句是服务器准备的(请参阅PostgreSQL服务器端预处理语句的生命周期

    BatchedQuery is representing that kind of multi-valued statements, so you will see that class used in reWriteBatchedInserts=true case only. BatchedQuery表示这种多值语句,因此您将看到reWriteBatchedInserts=true使用的类reWriteBatchedInserts=true

    The drawbacks of the feature might include: lower details as the "batch result". 该功能的缺点可能包括:较低的细节作为“批处理结果”。 For instance, regular batch gives you "per statement rowcount", however in multi-values case you just get "statement completed" status. 例如,常规批处理为您提供“per statement rowcount”,但在多值情况下,您只需获得“语句已完成”状态。 On top of that, on-the-fly rewritter might fail to parse certain SQL statements (eg https://github.com/pgjdbc/pgjdbc/issues/1045 ). 最重要的是,动态重写器可能无法解析某些SQL语句(例如https://github.com/pgjdbc/pgjdbc/issues/1045 )。

Batch processing does not collapse or minimize the number of SQL statements done; 批处理不会崩溃或最小化SQL语句的数量; it is all about optimizing how Hibernate caches and flushes things to the database in its in-memory session. 它是关于优化Hibernate如何在其内存中会话中缓存和刷新数据库的。 The importance of batch processing and finding the right batch size for your operation is to find the right balance between app memory used and database performance. 批处理和为您的操作找到合适的批处理大小的重要性在于在使用的应用程序内存和数据库性能之间找到适当的平衡。

  • You will run out of app server memory if you do too many queries before committing/flushing a batch 如果在提交/刷新批处理之前执行了太多查询,则将耗尽应用程序服务器内存
  • you will not get the best performance if your batch size is too small and you are committing/flushing too often. 如果您的批量太小而且您经常进行/冲洗,您将无法获得最佳性能。

More reading here. 更多阅读在这里。

https://docs.jboss.org/hibernate/orm/3.3/reference/en/html/batch.html https://www.tutorialspoint.com/hibernate/hibernate_batch_processing.htm https://docs.jboss.org/hibernate/orm/3.3/reference/en/html/batch.html https://www.tutorialspoint.com/hibernate/hibernate_batch_processing.htm

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM