简体   繁体   English

Java GC 如何处理从大 Stream 加载的已处理 object 超过可用堆 ZCD69BZ4957F06CD8181?

[英]How does Java GC deals with processed object loaded from a large Stream exceeding available heap memory?

Let's say I have a stream of objects loaded from a database (using Spring Data JPA as follow)假设我有一个从数据库加载的对象的 stream(使用 Spring 数据 JPA 如下)

public interface MyJpaRepository extends JpaRepository<Foo, String> {

  Stream<Foo> findAll();
}

And let's say there's millions of Foo objects stored in my database using way more GB than my max heap memory size.假设有数百万个 Foo 对象存储在我的数据库中,使用的 GB 比我的最大堆 memory 大小还要多。

I'm expecting consuming the stream as follow would let the JVM handle properly its heap memory by garbage collecting processed objects as more are loaded from the database:我期望使用 stream 如下将让 JVM 正确处理其堆 memory,因为从数据库加载更多的对象:

try (Stream<Foo> fooStream =
    myJpaRepository.findAll()) {
  fooStream.forEach(entity -> logger.info("Hello !"));
}

But in facts, this exact code throws an out of memory exception.但事实上,这个确切的代码抛出了 memory 异常。

  • How does the garbage collector acts in this case?垃圾收集器在这种情况下如何工作?
  • How consuming this stream using a forEach requires the JVM to entirely load the data from the stream in memory (as per my understanding)? How consuming this stream using a forEach requires the JVM to entirely load the data from the stream in memory (as per my understanding)?

Thank you谢谢

Java Stream won't fetch all the data from the underlying database. Java Stream 不会从底层数据库中获取所有数据。 Streams do not store data;流不存储数据; rather, they provide data from a source such as a collection, array, or IO channel.相反,它们提供来自集合、数组或 IO 通道等来源的数据。 Generally, these are lazily evaluated.通常,这些都是惰性评估的。 So, when the looger.info gets called on each entity, stream will fetch the data from the underlying data store and apply the command.因此,当每个实体调用looger.info时,stream 将从底层数据存储中获取数据并应用命令。 Since the stream just provides an iterator, it only needs to fetch the next data in the iteration not the whole set.由于 stream 只提供了一个迭代器,它只需要获取迭代中的下一个数据,而不是整个集合。 And the GC will remove the fetched data once the lambda function has been applied to it.一旦应用了 lambda function ,GC 将删除获取的数据。

In your scenario, garbage collector will not get time spot to act and clean up your memory.在您的场景中,垃圾收集器将没有时间来采取行动并清理您的 memory。 Let me try to explain in more details.让我尝试更详细地解释。 When you start your java process, you configured heap memory as well as garbage collection algorithm.当您启动 java 进程时,您配置了堆 memory 以及垃圾收集算法。 In case if you didn't fine tune either of them, JVM take for granted the default settings and proceed.如果您没有对其中任何一个进行微调,JVM 会理所当然地使用默认设置并继续。 Once your process starts allocating heap, JVM internally collects statistics and schedule garbage collection process.一旦您的进程开始分配堆,JVM 在内部收集统计信息并安排垃圾收集进程。 But if your process doesn't provide the breathing space to take a decision on when and how to collect garbage, JVM will throw Out of Memory(OOM) error and crash as you observed.但是,如果您的进程没有提供喘息的空间来决定何时以及如何收集垃圾,JVM 将抛出内存不足(OOM)错误并崩溃,正如您所观察到的。

@ernest_k was 100% in his comment, this issue has nothing to do with Streams. @ernest_k 在他的评论中是 100%,这个问题与 Streams 无关。 As @avishek-bhattacharya explained:正如@avishek-bhattacharya 解释的那样:

Streams do not store data;流不存储数据; rather, they provide data from a source such as a collection, array, or IO channel.相反,它们提供来自集合、数组或 IO 通道等来源的数据。 Generally, these are lazily evaluated.通常,这些都是惰性评估的。

In fact, Postgres (the underlying DB in my case) always returns the entire ResultSet unless configured otherwise (samething for MySQL).事实上,Postgres(在我的例子中是底层数据库)总是返回整个 ResultSet,除非另有配置(MySQL 也是如此)。 And to configure it to use a database Cursor, you need to do as follow:要将其配置为使用数据库 Cursor,您需要执行以下操作:

public interface MyJpaRepository extends JpaRepository<Foo, String> {

  @QueryHints(
    value = {
      @QueryHint(name = HINT_FETCH_SIZE, value = "1000"),
      @QueryHint(name = HINT_CACHEABLE, value = "false"),
      @QueryHint(name = HINT_READONLY, value = "true")
  })
  Stream<Foo> findAll();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM