简体   繁体   English

如何在 JDBC 中 select 最佳批量大小?

[英]How to select optimal batch size in JDBC?

I have a CSV file with 50000 entries which I want to import in SQL using batch in JDBC.我有一个 CSV 文件,其中包含 50000 个条目,我想使用 JDBC 中的批处理将其导入 SQL 中。

What should be the optimal batch size for it?它的最佳批量大小应该是多少?

According to Oracle official recommendations, optimal batch size is between 50 and 100根据Oracle官方建议,最佳batch size在50到100之间

Proof: https://docs.oracle.com/cd/E11882_01/java.112/e16548/oraperf.htm#JJDBC28754证明: https://docs.oracle.com/cd/E11882_01/java.112/e16548/oraperf.htm#JJDBC28754

Oracle recommends that you use JDBC standard features when possible. Oracle 建议您尽可能使用 JDBC 标准功能。 This recommendation applies to update batching as well.此建议也适用于更新批处理。 Oracle update batching is retained primarily for backwards compatibility.保留 Oracle 更新批处理主要是为了向后兼容。

For both standard update batching and Oracle update batching, Oracle recommends you to keep the batch sizes in the general range of 50 to 100. This is because though the drivers support larger batches, they in turn result in a large memory footprint with no corresponding increase in performance. For both standard update batching and Oracle update batching, Oracle recommends you to keep the batch sizes in the general range of 50 to 100. This is because though the drivers support larger batches, they in turn result in a large memory footprint with no corresponding increase在性能上。 Very large batches usually result in a decline in performance compared to smaller batches.与小批量相比,非常大的批次通常会导致性能下降。

Have a nice day祝你今天过得愉快

50k records is not a large dataset. 50k 条记录并不是一个大数据集。 Bigger batch size will help but if you assume that your database server network round trip is 10 ms:更大的批处理大小会有所帮助,但如果您假设您的数据库服务器网络往返时间为 10 毫秒:

  1. Batch size 50 => 50,000 rows / 50 batch size * 10 ms latency = 10000 ms latency overhead = 10 sec of latency overhead批量大小 50 => 50,000 行 / 50 批量大小 * 10 毫秒延迟 = 10000 毫秒延迟开销 = 10 秒延迟开销

  2. Batch size 100 => 50,000 rows / 100 batch size * 10 ms latency = 5000 ms latency overhead = 5 sec of latency overhead批量大小 100 => 50,000 行 / 100 批量大小 * 10 毫秒延迟 = 5000 毫秒延迟开销 = 5 秒延迟开销

Start by setting aa reasonable batch size for the batch insert statements, and then measure how long it actually takes to insert the rows Remember to vacuum after the bulk insert.首先为批量插入语句设置一个合理的批量大小,然后测量插入行实际需要多长时间记得在批量插入之后进行真空处理。

If 50k records take 1 minute to insert you need to focus on optimizing the insertion process and not the JDBC batch size since only fraction of total time is spent in the latency overhead.如果插入 50k 条记录需要 1 分钟,您需要专注于优化插入过程,而不是 JDBC 批量大小,因为只有一小部分总时间花费在延迟开销上。

For larger data sets you should not be using JDBC.对于较大的数据集,您不应该使用 JDBC。 There are tools designed for the bulk insertion task eg Oracle has SQL*Loader .有些工具专为批量插入任务而设计,例如 Oracle 具有SQL*Loader

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM