简体   繁体   English

为什么 Spring 的 jdbcTemplate.batchUpdate() 这么慢?

[英]Why Spring's jdbcTemplate.batchUpdate() so slow?

I'm trying to find the faster way to do batch insert .我正在尝试找到更快的方法来进行批量插入

I tried to insert several batches with jdbcTemplate.update(String sql) , where sql was builded by StringBuilder and looks like:我尝试使用jdbcTemplate.update(String sql)插入几批,其中 sql 由 StringBuilder 构建,看起来像:

INSERT INTO TABLE(x, y, i) VALUES(1,2,3), (1,2,3), ... , (1,2,3)

Batch size was exactly 1000. I inserted nearly 100 batches.批量大小恰好是 1000。我插入了将近 100 个批次。 I checked the time using StopWatch and found out insert time:我使用 StopWatch 检查了时间,发现了插入时间:

min[38ms], avg[50ms], max[190ms] per batch

I was glad but I wanted to make my code better.我很高兴,但我想让我的代码更好。

After that, I tried to use jdbcTemplate.batchUpdate in way like:之后,我尝试以如下方式使用 jdbcTemplate.batchUpdate:

    jdbcTemplate.batchUpdate(sql, new BatchPreparedStatementSetter() {
        @Override
        public void setValues(PreparedStatement ps, int i) throws SQLException {
                       // ...
        }
        @Override
        public int getBatchSize() {
            return 1000;
        }
    });

where sql was look like sql 看起来像

INSERT INTO TABLE(x, y, i) VALUES(1,2,3);

and I was disappointed.我很失望。 jdbcTemplate executed every single insert of 1000 lines batch in separated way. jdbcTemplate 以单独的方式批量执行每一个 1000 行的插入。 I loked at mysql_log and found there a thousand inserts: I checked the time using StopWatch and found out insert time:我查看 mysql_log 并发现有一千个插入:我使用 StopWatch 检查了时间并发现了插入时间:

min[900ms], avg[1100ms], max[2000ms] per Batch每批最小[900ms]、平均[1100ms]、最大[2000ms]

So, can anybody explain to me, why jdbcTemplate doing separated inserts in this method?那么,有人可以向我解释一下,为什么 jdbcTemplate 在此方法中进行分离插入吗? Why method's name is batchUpdate ?为什么方法的名称是batchUpdate Or may be I am using this method in wrong way?或者我可能以错误的方式使用这种方法?

These parameters in the JDBC connection URL can make a big difference in the speed of batched statements --- in my experience, they speed things up: JDBC 连接 URL 中的这些参数可以对批处理语句的速度产生很大影响 --- 根据我的经验,它们可以加快速度:

?useServerPrepStmts=false&rewriteBatchedStatements=true ?useServerPrepStmts=false&rewriteBatchedStatements=true

See: JDBC batch insert performance请参阅: JDBC 批量插入性能

I have also faced the same issue with Spring JDBC template.我也遇到了与 Spring JDBC 模板相同的问题。 Probably with Spring Batch the statement was executed and committed on every insert or on chunks, that slowed things down.可能使用 Spring Batch 语句在每次插入或块上执行和提交,这会减慢速度。

I have replaced the jdbcTemplate.batchUpdate() code with original JDBC batch insertion code and found the Major performance improvement .我已经用原来的 JDBC 批量插入代码替换了 jdbcTemplate.batchUpdate() 代码,发现了主要的性能改进

DataSource ds = jdbcTemplate.getDataSource();
Connection connection = ds.getConnection();
connection.setAutoCommit(false);
String sql = "insert into employee (name, city, phone) values (?, ?, ?)";
PreparedStatement ps = connection.prepareStatement(sql);
final int batchSize = 1000;
int count = 0;

for (Employee employee: employees) {

    ps.setString(1, employee.getName());
    ps.setString(2, employee.getCity());
    ps.setString(3, employee.getPhone());
    ps.addBatch();

    ++count;

    if(count % batchSize == 0 || count == employees.size()) {
        ps.executeBatch();
        ps.clearBatch(); 
    }
}

connection.commit();
ps.close();

Check this link as well JDBC batch insert performance检查此链接以及JDBC 批量插入性能

I found a major improvement setting the argTypes array in the call.我发现在调用中设置 argTypes 数组的重大改进

In my case, with Spring 4.1.4 and Oracle 12c, for insertion of 5000 rows with 35 fields:在我的例子中,使用 Spring 4.1.4 和 Oracle 12c,插入 5000 行有 35 个字段:

jdbcTemplate.batchUpdate(insert, parameters); // Take 7 seconds

jdbcTemplate.batchUpdate(insert, parameters, argTypes); // Take 0.08 seconds!!!

The argTypes param is an int array where you set each field in this way: argTypes 参数是一个 int 数组,您可以在其中以这种方式设置每个字段:

int[] argTypes = new int[35];
argTypes[0] = Types.VARCHAR;
argTypes[1] = Types.VARCHAR;
argTypes[2] = Types.VARCHAR;
argTypes[3] = Types.DECIMAL;
argTypes[4] = Types.TIMESTAMP;
.....

I debugged org\\springframework\\jdbc\\core\\JdbcTemplate.java and found that most of the time was consumed trying to know the nature of each field, and this was made for each record.我调试了 org\\springframework\\jdbc\\core\\JdbcTemplate.java ,发现大部分时间都花在了试图了解每个字段的性质上,这是为每个记录制作的。

Hope this helps !希望这有帮助!

Simply use transaction.只需使用事务。 Add @Transactional on method.在方法上添加@Transactional。

Be sure to declare the correct TX manager if using several datasources @Transactional("dsTxManager").如果使用多个数据源 @Transactional("dsTxManager"),请务必声明正确的 TX 管理器。 I have a case where inserting 60000 records.我有一个插入 60000 条记录的情况。 It takes about 15s.大约需要15s。 No other tweak:没有其他调整:

@Transactional("myDataSourceTxManager")
public void save(...) {
...
    jdbcTemplate.batchUpdate(query, new BatchPreparedStatementSetter() {

            @Override
            public void setValues(PreparedStatement ps, int i) throws SQLException {
                ...

            }

            @Override
            public int getBatchSize() {
                if(data == null){
                    return 0;
                }
                return data.size();
            }
        });
    }

Change your sql insert to INSERT INTO TABLE(x, y, i) VALUES(1,2,3) .将您的 sql 插入更改为INSERT INTO TABLE(x, y, i) VALUES(1,2,3) The framework creates a loop for you.该框架为您创建了一个循环。 For example:例如:

public void insertBatch(final List<Customer> customers){

  String sql = "INSERT INTO CUSTOMER " +
    "(CUST_ID, NAME, AGE) VALUES (?, ?, ?)";

  getJdbcTemplate().batchUpdate(sql, new BatchPreparedStatementSetter() {

    @Override
    public void setValues(PreparedStatement ps, int i) throws SQLException {
        Customer customer = customers.get(i);
        ps.setLong(1, customer.getCustId());
        ps.setString(2, customer.getName());
        ps.setInt(3, customer.getAge() );
    }

    @Override
    public int getBatchSize() {
        return customers.size();
    }
  });
}

IF you have something like this.如果你有这样的事情。 Spring will do something like: Spring 将执行以下操作:

for(int i = 0; i < getBatchSize(); i++){
   execute the prepared statement with the parameters for the current iteration
}

The framework first creates PreparedStatement from the query (the sql variable) then the setValues method is called and the statement is executed.框架首先从查询( sql变量)创建 PreparedStatement,然后调用 setValues 方法并执行语句。 that is repeated as much times as you specify in the getBatchSize() method.重复您在getBatchSize()方法中指定的getBatchSize() So the right way to write the insert statement is with only one values clause.所以编写插入语句的正确方法是只有一个 values 子句。 You can take a look at http://docs.spring.io/spring/docs/3.0.x/reference/jdbc.html你可以看看http://docs.spring.io/spring/docs/3.0.x/reference/jdbc.html

I don't know if this will work for you, but here's a Spring-free way that I ended up using.我不知道这是否适合您,但这是我最终使用的一种无 Spring 方式。 It was significantly faster than the various Spring methods I tried.它比我尝试过的各种 Spring 方法要快得多。 I even tried using the JDBC template batch update method the other answer describes, but even that was slower than I wanted.我什至尝试使用另一个答案描述的 JDBC 模板批量更新方法,但即使这样也比我想要的要慢。 I'm not sure what the deal was and the Internets didn't have many answers either.我不确定交易是什么,互联网也没有很多答案。 I suspected it had to do with how commits were being handled.我怀疑这与提交的处理方式有关。

This approach is just straight JDBC using the java.sql packages and PreparedStatement's batch interface.这种方法只是使用 java.sql 包和 PreparedStatement 的批处理接口的直接 JDBC。 This was the fastest way that I could get 24M records into a MySQL DB.这是我将 24M 记录放入 MySQL 数据库的最快方法。

I more or less just built up collections of "record" objects and then called the below code in a method that batch inserted all the records.我或多或少只是建立了“记录”对象的集合,然后在批量插入所有记录的方法中调用以下代码。 The loop that built the collections was responsible for managing the batch size.构建集合的循环负责管理批量大小。

I was trying to insert 24M records into a MySQL DB and it was going ~200 records per second using Spring batch.我试图将 24M 条记录插入到 MySQL 数据库中,并且使用 Spring 批处理每秒可以达到 200 条记录。 When I switched to this method, it went up to ~2500 records per second.当我切换到这种方法时,它达到了每秒约 2500 条记录。 so my 24M record load went from a theoretical 1.5 days to about 2.5 hours.所以我的 24M 记录负载从理论上的 1.5 天变成了大约 2.5 小时。

First create a connection...首先建立一个连接...

Connection conn = null;
try{
    Class.forName("com.mysql.jdbc.Driver");
    conn = DriverManager.getConnection(connectionUrl, username, password);
}catch(SQLException e){}catch(ClassNotFoundException e){}

Then create a prepared statement and load it with batches of values for insert, and then execute as a single batch insert...然后创建一个准备好的语句并使用批量插入值加载它,然后作为单个批量插入执行...

PreparedStatement ps = null;
try{
    conn.setAutoCommit(false);
    ps = conn.prepareStatement(sql); // INSERT INTO TABLE(x, y, i) VALUES(1,2,3)
    for(MyRecord record : records){
        try{
            ps.setString(1, record.getX());
            ps.setString(2, record.getY());
            ps.setString(3, record.getI());

            ps.addBatch();
        } catch (Exception e){
            ps.clearParameters();
            logger.warn("Skipping record...", e);
        }
    }

    ps.executeBatch();
    conn.commit();
} catch (SQLException e){
} finally {
    if(null != ps){
        try {ps.close();} catch (SQLException e){}
    }
}

Obviously I've removed error handling and the query and Record object is notional and whatnot.显然,我已经删除了错误处理,并且查询和 Record 对象是名义上的等等。

Edit: Since your original question was comparing the insert into foobar values (?,?,?), (?,?,?)...(?,?,?) method to Spring batch, here's a more direct response to that:编辑:由于您最初的问题是将插入到 foobar 值 (?,?,?), (?,?,?)...(?,?,?) 方法与 Spring 批处理进行比较,这里有一个更直接的回应:

It looks like your original method is likely the fastest way to do bulk data loads into MySQL without using something like the "LOAD DATA INFILE" approach.看起来您的原始方法可能是将批量数据加载到 MySQL 中而不使用“LOAD DATA INFILE”之类的方法的最快方法。 A quote from the MysQL docs ( http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html ):引自 MysQL 文档( http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html ):

If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time.如果您同时从同一客户端插入多行,请使用带有多个 VALUES 列表的 INSERT 语句一次插入多行。 This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.这比使用单独的单行 INSERT 语句快得多(在某些情况下快很多倍)。

You could modify the Spring JDBC Template batchUpdate method to do an insert with multiple VALUES specified per 'setValues' call, but you'd have to manually keep track of the index values as you iterate over the set of things being inserted.您可以修改 Spring JDBC 模板 batchUpdate 方法以使用每个 'setValues' 调用指定的多个 VALUES 进行插入,但在迭代插入的一组内容时,您必须手动跟踪索引值。 And you'd run into a nasty edge case at the end when the total number of things being inserted isn't a multiple of the number of VALUES lists you have in your prepared statement.当插入的事物总数不是您在准备好的语句中拥有的 VALUES 列表数量的倍数时,您会在最后遇到一个令人讨厌的边缘情况。

If you use the approach I outline, you could do the same thing (use a prepared statement with multiple VALUES lists) and then when you get to that edge case at the end, it's a little easier to deal with because you can build and execute one last statement with exactly the right number of VALUES lists.如果您使用我概述的方法,您可以做同样的事情(使用带有多个 VALUES 列表的准备好的语句),然后当您最终遇到那个边缘情况时,处理起来会容易一些,因为您可以构建和执行具有完全正确数量的 VALUES 列表的最后一个语句。 It's a bit hacky, but most optimized things are.这有点hacky,但大多数优化的东西都是。

I had also some bad time with Spring JDBC batch template.我在使用 Spring JDBC 批处理模板时也遇到了一些不愉快。 In my case, it would be, like, insane to use pure JDBC, so instead I used NamedParameterJdbcTemplate .就我而言,使用纯 JDBC 会很疯狂,所以我使用了NamedParameterJdbcTemplate This was a must have in my project.这在我的项目中是必须的。 But it was way slow to insert hundreds os thousands of lines in the database.但是在数据库中插入成百上千行的速度很慢。

To see what was going on, I've sampled it with VisualVM during the batch update and, voilà:为了了解发生了什么,我在批量更新期间使用 VisualVM 对其进行了采样,瞧:

visualvm 显示速度慢的地方

What was slowing the process was that, while setting the parameters, Spring JDBC was querying the database to know the metadata each parameter.减慢进程的原因是,在设置参数时,Spring JDBC 正在查询数据库以了解每个参数的元数据。 And seemed to me that it was querying the database for each parameter for each line every time .而在我看来,这是查询各种参数,每行每一次的数据库。 So I just taught Spring to ignore the parameter types (as it is warned in the Spring documentation about batch operating a list of objects ):所以我只是教 Spring 忽略参数类型(正如Spring 文档中关于批量操作对象列表的警告):

    @Bean(name = "named-jdbc-tenant")
    public synchronized NamedParameterJdbcTemplate getNamedJdbcTemplate(@Autowired TenantRoutingDataSource tenantDataSource) {
        System.setProperty("spring.jdbc.getParameterType.ignore", "true");
        return new NamedParameterJdbcTemplate(tenantDataSource);
    }

Note: the system property must be set before creating the JDBC Template object.注意:必须创建 JDBC 模板对象之前设置系统属性。 It would be possible to just set in the application.properties , but this solved and I've never after touched this again可以只在application.properties设置,但这解决了,我再也没有碰过这个

Solution given by @Rakesh worked for me. @Rakesh 给出的解决方案对我有用。 Significant improvement in performance.性能显着提升。 Earlier time was 8 min, with this solution taking less than 2 min.较早的时间是 8 分钟,此解决方案耗时不到 2 分钟。

DataSource ds = jdbcTemplate.getDataSource();
Connection connection = ds.getConnection();
connection.setAutoCommit(false);
String sql = "insert into employee (name, city, phone) values (?, ?, ?)";
PreparedStatement ps = connection.prepareStatement(sql);
final int batchSize = 1000;
int count = 0;

for (Employee employee: employees) {

    ps.setString(1, employee.getName());
    ps.setString(2, employee.getCity());
    ps.setString(3, employee.getPhone());
    ps.addBatch();

    ++count;

    if(count % batchSize == 0 || count == employees.size()) {
        ps.executeBatch();
        ps.clearBatch(); 
    }
}

connection.commit();
ps.close();

Encountered some serious performance issue with JdbcBatchItemWriter.write() ( link ) from Spring Batch and find out the write logic delegates to JdbcTemplate.batchUpdate() eventually. Spring Batch 中的JdbcBatchItemWriter.write()链接)遇到了一些严重的性能问题,最终找出了JdbcTemplate.batchUpdate()的写入逻辑委托。

Adding a Java system properties of spring.jdbc.getParameterType.ignore=true fixed the performance issue entirely ( from 200 records per second to ~ 5000 ).添加 Java 系统属性spring.jdbc.getParameterType.ignore=true完全解决了性能问题(从每秒 200 条记录到 ~ 5000)。 The patch was tested working on both Postgresql and MsSql (might not be dialect specific)该补丁已在 Postgresql 和 MsSql 上进行测试(可能不是特定于方言的)

... and ironically, Spring documented this behaviour under a "note" section link ...具有讽刺意味的是,Spring 在“注释”部分链接下记录了此行为

In such a scenario, with automatic setting of values on an underlying PreparedStatement, the corresponding JDBC type for each value needs to be derived from the given Java type.在这种情况下,通过在基础 PreparedStatement 上自动设置值,每个值对应的 JDBC 类型需要从给定的 Java 类型派生。 While this usually works well, there is a potential for issues (for example, with Map-contained null values).虽然这通常很有效,但也有可能出现问题(例如,Map-contained null 值)。 Spring, by default, calls ParameterMetaData.getParameterType in such a case, which can be expensive with your JDBC driver. Spring,默认情况下,在这种情况下调用 ParameterMetaData.getParameterType,这对于您的 JDBC 驱动程序来说可能很昂贵。 You should use a recent driver version and consider setting the spring.jdbc.getParameterType.ignore property to true (as a JVM system property or in a spring.properties file in the root of your classpath) if you encounter a performance issue — for example, as reported on Oracle 12c (SPR-16139).如果遇到性能问题,您应该使用最新的驱动程序版本并考虑将 spring.jdbc.getParameterType.ignore 属性设置为 true(作为 JVM 系统属性或类路径根目录中的 spring.properties 文件)——例如,在 Oracle 12c (SPR-16139) 上报告。

Alternatively, you might consider specifying the corresponding JDBC types explicitly, either through a 'BatchPreparedStatementSetter' (as shown earlier), through an explicit type array given to a 'List<Object[]>' based call, through 'registerSqlType' calls on a custom 'MapSqlParameterSource' instance, or through a 'BeanPropertySqlParameterSource' that derives the SQL type from the Java-declared property type even for a null value.或者,您可以考虑明确指定相应的 JDBC 类型,通过“BatchPreparedStatementSetter”(如前所示),通过给定基于“List<Object[]>”的调用的显式类型数组,通过“registerSqlType”调用自定义“MapSqlParameterSource”实例,或通过“BeanPropertySqlParameterSource”从 Java 声明的属性类型派生 SQL 类型,即使是 null 值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用spring“ jdbcTemplate.batchUpdate”进行动态插入查询的批处理 - batch processing with dynamic insert query using spring “jdbcTemplate.batchUpdate” jdbcTemplate.batchUpdate 跳过策略 - jdbcTemplate.batchUpdate Skip policy jdbctemplate.batchupdate 是多线程的还是并发的? - Is jdbctemplate.batchupdate multithreaded or concurrent? 有关大量查询的jdbcTemplate.batchUpdate()问题 - Issue with jdbcTemplate.batchUpdate() for large list of queries 批量插入使用 jdbcTemplate.batchUpdate 混淆 - Batch insert using jdbcTemplate.batchUpdate confusion 为 jdbcTemplate.batchUpdate() 方法编写单元测试 - Write unit test for jdbcTemplate.batchUpdate() method 为什么 spring jdbcTemplate batchUpdate 逐行插入 - why spring jdbcTemplate batchUpdate insert row by row 列表小于 getBatchSize() 的 jdbcTemplate.batchUpdate() 抛出 IndexOutOfBoundsException 错误 - jdbcTemplate.batchUpdate() with smaller list than getBatchSize() throws IndexOutOfBoundsException error 有什么更好的方法检查JdbcTemplate.batchUpdate中的名称字段,名称表的SQL注入? - What better way to check SQL injection of names field, names tables in JdbcTemplate.batchUpdate? 如何包括jdbctemplate.batchUpdate(String sql,List的两个参数 <Object[]> batchArgs)? - How to include two parameters for jdbctemplate.batchUpdate(String sql, List<Object[]> batchArgs)?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM