简体   繁体   English

Java8 Stream 批处理以避免OutOfMemory

[英]Java8 Stream batch processing to avoid OutOfMemory

I'm having something like:我有类似的东西:

    List<Data> dataList = stepts.stream()
        .flatMap(step -> step.getPartialDataList().stream())
        .collect(Collectors.toList());

So I'm combining into dataList multiple lists from every step.所以我将每一步的多个列表合并到dataList中。

My problem is that dataList might run into OutOfMemoryError .我的问题是 dataList 可能会遇到OutOfMemoryError Any suggestions on how I can batch the dataList and save the batches into db?关于如何批处理数据列表并将批处理保存到数据库中的任何建议?

My primitive idea is to:我的原始想法是:

    for (Step step : steps) {
        List<Data> partialDataList = step.getPartialDataList();

        if (dataList.size() + partialDataList.size() <= MAXIMUM_SIZE) {
            dataList.addAll(partialDataList);
        } else {
            saveIntoDb(dataList);
            dataList = new ArrayList<>();
        }
    }

PS: I know there is this post, but the difference is that I might not be able to store whole data in memory. PS:我知道有这个帖子,但不同的是我可能无法将整个数据存储在 memory 中。

LE: getPartialDataList metod is more like createPartialDataList() LE: getPartialDataList方法更像是createPartialDataList()

If your concern is OutOfMemoryError you probably shouldn't create additional intermediate data structures like lists or streams before saving to the database.如果您担心的是OutOfMemoryError ,那么您可能不应该在保存到数据库之前创建其他中间数据结构,例如列表或流。

Since the Step.getPartialDataList() already returns List<Data> the data is already in the memory, unless you have your own List implementation.由于Step.getPartialDataList()已经返回List<Data>数据已经在 memory 中,除非您有自己的List实现。 You just need to use JDBC batch insert :您只需要使用JDBC 批量插入

PreparedStatement ps = c.prepareStatement("INSERT INTO data VALUES (?, ?, ...)");
for (Step step : steps) {
    for (Data data : step.getPartialDataList()) {
        ps.setString(1, ...);
        ps.setString(2, ...);
        ...
        ps.addBatch();
    }   
}
ps.executeBatch();

There is no need to chunk into smaller batches prematurely with dataList .无需使用dataList过早地分块成更小的批次。 First see what your database and JDBC driver are supporting before doing premature optimizations.在进行过早优化之前,首先查看您的数据库和 JDBC 驱动程序支持什么。

Do note that for most databases the right way to insert large amount of data is an external utility and not JDBC eg PostgreSQL has COPY .请注意,对于大多数数据库,插入大量数据的正确方法是外部实用程序,而不是 JDBC 例如PostgreSQL 具有COPY

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM