简体   繁体   English

Azure:已准备的语句超出了每个会话20 MB的内存限制

[英]Azure: Exceeded the memory limit of 20 MB per session for prepared statements

I'm executing lot's of batches, containing prepared insert statements 我正在执行很多批处理,其中包含准备好的insert语句

public static void main(String... args) throws Exception {
    Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");
    BufferedReader csv = new BufferedReader(new InputStreamReader(Main.class.getClassLoader().getResourceAsStream("records.csv")));
    String line;
    createConnectionAndPreparedStatement();
    while ((line = csv.readLine()) != null) {
        tupleNum++;
        count++;
        List<String> row = new ArrayList<String>(Arrays.asList(line.split(";")));

        tupleCache.add(row);
        addBatch(row, ps);
        if (count > BATCH_SIZE) {
            count = 0;
            executeBatch(ps);
            tupleCache.clear();
        }
    }
}

protected static void createConnectionAndPreparedStatement() throws SQLException {
    System.out.println("Opening new connection!");
    con = DriverManager.getConnection(jdbcUrl, jdbcUser, jdbcPassword);
    con.setAutoCommit(true);
    con.setAutoCommit(false);
    ps = con.prepareStatement(insertQuery);

    count = 0;
}


private static void executeBatch(PreparedStatement ps) throws SQLException, IOException, InterruptedException {
    try {
        ps.executeBatch();
    } catch (BatchUpdateException bue) {
        if (bue.getMessage() != null && bue.getMessage().contains("Exceeded the memory limit")) {
            // silently close the old connection to free resources
            try {
                con.close();
            } catch (Exception ex) {}
            createConnectionAndPreparedStatement();
            for (List<String> t : tupleCache) {
                addBatch(t, ps);
            }
            // let's retry once
            ps.executeBatch();
        }
    }
    System.out.println("Batch succeeded! -->" + tupleNum );
    con.commit();
    ps.clearWarnings();
    ps.clearBatch();
    ps.clearParameters();
}

private static void addBatch(List<String> tuple, PreparedStatement ps) throws SQLException {
    int sqlPos = 1;
    int size = tuple.size();
    for (int i = 0; i < size; i++) {
        String field = tuple.get(i);
        //log.error(String.format("Setting value at pos [%s] to value [%s]", i, field));
        if (field != null) {
            ps.setString(sqlPos, field);
            sqlPos++;
        } else {
            ps.setNull(sqlPos, java.sql.Types.VARCHAR);
            sqlPos++;
        }
    }
    ps.addBatch();
}

So in standalone application everything is fine and no exceptions occur after 700k batch insertions. 因此,在独立应用程序中,一切都很好,并且在插入700k批处理后没有异常发生。 But when I execute actually same code in custom pig StoreFunc after about 6-7k batch insertions I get the following exception: 但是,当我在大约6-7k的批处理插入之后在自定义的StoreFunc执行实际上相同的代码时,出现以下异常:

java.sql.BatchUpdateException: 112007;Exceeded the memory limit of 20 MB per session for prepared statements. Reduce the number or size of the prepared statements.
    at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:1824)

And only restarting connection helps. 而且只有重新启动连接才有帮助。 Can someone help me with ideas why it's happening and how to fix it? 有人可以为我提供想法为什么会发生以及如何解决它吗?

According to your description & the error information, per my experience, I think the issue was caused by the configuration about memory at the server side of SQL Azure, such as memory limits for connections within the server resource pool. 根据您的描述和错误信息,根据我的经验,我认为该问题是由SQL Azure服务器端的内存配置引起的,例如服务器资源池中连接的内存限制。

I tried to follow the clue to search for the specific explaination about connection memory limits, but failed, besides the content below from here . 我试图按照线索来搜索有关连接内存限制的具体说明,但是除了下面的内容之外,都失败

Connection Memory 连接存储器

SQL Server sets aside three packet buffers for every connection made from a client. SQL Server为客户端进行的每个连接预留三个数据包缓冲区。 Each buffer is sized according to the default network packet size specified by the sp_configure stored procedure. 每个缓冲区的大小均根据sp_configure存储过程指定的默认网络数据包大小确定。 If the default network packet size is less than 8KB, the memory for these packets comes from SQL Server's buffer pool. 如果默认的网络数据包大小小于8KB,则这些数据包的内存来自SQL Server的缓冲池。 If it's 8KB or larger, the memory is allocated from SQL Server's MemToLeave region. 如果大于或等于8KB,则从SQL Server的MemToLeave区域分配内存。

And I continued to search for packet size & MemToLeave and view them. 然后,我继续搜索packet sizeMemToLeave并查看它们。

Based on the above information, I guess that "Exceeded the memory limit of 20 MB per session for prepared statements" means all memory used of parallel connections over the max memory buffer pool of SQL Azure instance. 根据上述信息,我想“准备好的语句的每个会话的内存限制超过20 MB”是指SQL Azure实例的最大内存缓冲池上并行连接使用的所有内存。

So there are two solutions I suggested which you can try. 因此,我建议您可以尝试两种解决方案。

  1. Recommended to reduce the value of BATCH_SIZE variable to make the server memory cost less than the max size of memory buffer pool. 建议减小BATCH_SIZE变量的值,以使服务器内存成本小于内存缓冲池的最大大小。
  2. Try to scale up your SQL Azure instance. 尝试扩展您的SQL Azure实例。

Hope it helps. 希望能帮助到你。


Here are two new suggestions. 这是两个新建议。

  1. I'm really not sure that the MS jdbc driver whether supports the current scenario using Apache Pig to do this like a paralleled ETL job. 我真的不确定MS jdbc驱动程序是否支持使用Apache Pig像并行ETL作业那样执行当前方案。 Please try to use jtds jdbc driver instead of the MS one. 请尝试使用jtds jdbc驱动程序而不是MS驱动程序。
  2. A better way I think is using more professional tools to do this, such as sqoop or kettle . 我认为一种更好的方法是使用更专业的工具(例如sqoopkettle

I am running into the same issue when I tried to write a pandas dataframe to the Azure SQL data warehouse. 当我尝试将熊猫数据帧写入Azure SQL数据仓库时遇到了同样的问题。 I specified the chunksize, assigned the load user with the largest resource class. 我指定了块大小,并为负载用户分配了最大的资源类。 However, the issue still occurs. 但是,问题仍然存在。

According to the documentation, INSERT VALUE statement by default only uses the smallrc resource class . 根据文档,默认情况下,INSERT VALUE语句仅使用smallrc资源类

The only solution I can think of is to scale up the DWU but it's not an optimal solution as the cost will go very high. 我能想到的唯一解决方案是扩大DWU的规模,但这不是最佳解决方案,因为成本会很高。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM