Eclipselink - 分离实体内存泄漏

Question

Setup设置

We are currently using wildfly with eclipselink as JPA implementation in JakartaEE application.我们目前正在使用带有 eclipselink 的wildfly 作为 JakartaEE 应用程序中的 JPA 实现。 Application itself is RESTful web server with REST, Service and DAO layers.应用程序本身是具有 REST、服务和 DAO 层的 RESTful Web 服务器。 DAO is the only layer that is using EntityManager. DAO 是唯一使用 EntityManager 的层。 We are always detaching entities for various reasons.我们总是出于各种原因分离实体。

To prevent eclipselink from automatic state checking and flushing changes to database防止 eclipselink 自动检查状态并将更改刷新到数据库
To prevent eclipselink from reusing same object on multiple reads ...为了防止 eclipselink 在多次读取时重用同一个对象......

However by using this approach we have noticed spike in memory usage that in some cases lead to OutOfMemory errors.然而，通过使用这种方法，我们注意到内存使用量激增，在某些情况下会导致OutOfMemory错误。

Diagnostics诊断

Using VisualVM we have pinpointed problem to be having a great number of instances of entities in memory.使用 VisualVM，我们确定了内存中存在大量实体实例的问题。

Test code测试代码

This is sample of code we are experiencing problems with (migration of some historic data)这是我们遇到问题的代码示例（一些历史数据的迁移）

LinkedList<SomeEntity> entities; //Here is loaded set of entities to process
while(!entities.isEmpty()) {
    SomeEntity entity = entities.removeFirst(); //We are iterating in quee fashion to allow GC to remove already processed items from memory
    if (entity.getItems().isEmpty()) {
        //this call is transactional
        entityService.delete(entity.getId());
     } else if (entity.getItems().stream().anyMatch(item -> item.getQuantity() > 0.0)){
        //DO SOME CHANGES ON ENTITY
        //this call is transactional
        entityService.update(operation);
     }
     entity = null;
}
entities = null;

Observations观察

While profiling memory usage we can see ever increasing count of entity classes in memory.在分析内存使用情况时，我们可以看到内存中实体类的数量不断增加。 It is not the same entity that is being worked with in test code, but it is entity, that is referenced at most time by other objects.它不是在测试代码中使用的同一个实体，而是一个实体，大多数时候被其他对象引用。 Sometimes part of them are cleared but overall number increases after some time有时其中一部分被清除，但一段时间后总数会增加
Number of instances greatly outnumbers records in database实例数大大超过数据库中的记录数
This means that every time object is referenced in relation, new instance is created (this is OK)这意味着每次在关系中引用对象时，都会创建新实例（这是可以的）
When we have created heap dump and looked from where the objects are referenced only eclipselink internal structures shows like relationshipSourceObject in org.eclipse.persistence.internal.indirection.UnitOfWorkQueryValueHolder#90312 owner in org.eclipse.persistence.internal.descriptors.changetracking.AttributeChangeListener#26713 , ...)当我们创建堆转储并从对象被引用的位置查看时，只有 eclipselink 内部结构显示如relationshipSourceObject in org.eclipse.persistence.internal.indirection.UnitOfWorkQueryValueHolder#90312 owner in org.eclipse.persistence.internal.descriptors.changetracking.AttributeChangeListener#26713 relationshipSourceObject in org.eclipse.persistence.internal.indirection.UnitOfWorkQueryValueHolder#90312 owner in org.eclipse.persistence.internal.descriptors.changetracking.AttributeChangeListener#26713 , ...)

What we have tried我们尝试过的

None of this helped:这些都没有帮助：

Setting eclipselink.cache.type.default to WEAK, SOFT or even NONE将 eclipselink.cache.type.default 设置为 WEAK、SOFT 甚至 NONE
Manually calling EntityManager.clear at end of the while在 while 结束时手动调用 EntityManager.clear

In my understanding WEAK should be enough to prevent eclipselink from storing references for too long and prevent GC.在我的理解 WEAK 应该足以防止 eclipselink 将引用存储太长时间并防止 GC。 But it is stored somewhere anyway and since that references are accessible from GC roots they are newer cleared.但它无论如何都存储在某个地方，并且由于可以从 GC 根访问该引用，因此它们被更新清除。 Can anyone explain this behavior or point me at direction where to look?任何人都可以解释这种行为或指出我该看的方向吗？

EDITS编辑

Addressing comment and Chris answer.处理评论和克里斯回答。 More information about how we use EM and transactions.有关我们如何使用 EM 和交易的更多信息。

We are detaching using EntityManager.detach method and references ( @OneToMany , @ManyToMany , etc) have Cascade.DETACH applied.我们正在使用 EntityManager.detach 方法进行分离，并且引用（ @OneToMany 、 @ManyToMany等）应用了 Cascade.DETACH。 Loading necessary lazy loaded references is done prior to detach.加载必要的延迟加载引用是在分离之前完成的。

I agree about the part about re-fetching entities.我同意关于重新获取实体的部分。 I would not mind having multiple instances of the same entity in memory for some time.我不介意在内存中存储同一实体的多个实例一段时间。 My problem is why it is not garbage collected.我的问题是为什么它没有被垃圾收集。

List of entities in sample code is loaded in one transaction on subsequent database UPDATE or DELETE (this also fetches some bits into memory creating more instances) is another transaction per entity.示例代码中的实体列表在后续数据库 UPDATE 或 DELETE 的一个事务中加载（这也会将一些位提取到内存中以创建更多实例）是每个实体的另一个事务。 I would probably expect most of the heap used during the initial call and then slowly clearing or remaining roughly same.我可能希望在初始调用期间使用的大部分堆，然后慢慢清除或保持大致相同。

About using EntityManager关于使用 EntityManager

We are using wildfly as JakartaEE container.我们使用wildfly 作为JakartaEE 容器。 By default it is shipped with hibernate as JPA provider but we have added eclipselink as module and configured provider in persistence.xml默认情况下，它与 hibernate 作为 JPA 提供程序一起提供，但我们已将 eclipselink 添加为模块并在 persistence.xml 中配置了提供程序

According to documentation container managed EntityManager creates instances as needed.根据文档容器管理的 EntityManager 根据需要创建实例。

Answer 1

Are you caching entities?你在缓存实体吗？ Clear is not enough to allow you to effectively cache, as if that is what you are trying, is likely related to your current issue. Clear 不足以让您有效地缓存，好像这就是您正在尝试的，很可能与您当前的问题有关。 Everything loaded from a EntityManager has are reference to that EntityManager, so I would guess that you are reading in a large list of entities that are partially fetched and caching them, then using EntityManager.clear() to try to detach them.从 EntityManager 加载的所有内容都是对该 EntityManager 的引用，因此我猜您正在读取部分获取并缓存它们的大型实体列表，然后使用 EntityManager.clear() 尝试分离它们。

Those entities are then no longer 'managed' but still reference the EntityManager.这些实体不再是“托管”的，但仍然引用 EntityManager。 As soon as you fetch something, such as the entity.getItems() call you've shown in code, assuming this is a standard OneToMany with a back pointer which defaults to be lazily loaded, this will force fetching all 'items' into memory.一旦你获取一些东西，比如你在代码中显示的 entity.getItems() 调用，假设这是一个标准的 OneToMany，带有默认为延迟加载的后向指针，这将强制将所有“项目”获取到内存中. As they have a back reference and 'this' entity isn't referenced by the EntityManager, the Item then has to refetch the entity.由于它们具有反向引用，并且 EntityManager 未引用“this”实体，因此 Item 必须重新获取实体。 So you now have two instances of the same Entity in memory Entity1' -> Item1 -> Entity1.因此，您现在在内存 Entity1' -> Item1 -> Entity1 中有同一实体的两个实例。

This can easily build up with more complex object graph and repeated clear calls.这可以通过更复杂的对象图和重复的清除调用轻松构建。

This can be, not solved, but the overhead reduced by reducing the scope of what you do in an EntityManager, so that it can be reused for identity purposes related to that object graph, and garbage collected (and cleared by GC) when objects it was used to read are also cleared by GC.这可以，不能解决，但是通过减少你在 EntityManager 中所做的工作的范围来减少开销，以便它可以被重用于与该对象图相关的标识目的，并在对象它时进行垃圾收集（并由 GC 清除）用于读取也被GC清除。

Eclipselink - 分离实体内存泄漏

问题描述

Setup设置

Diagnostics诊断

Test code测试代码

Observations观察

What we have tried我们尝试过的

EDITS编辑

1 个解决方案

解决方案1
0 2020-09-08 15:36:25

Eclipselink - 分离实体内存泄漏

问题描述

Setup设置

Diagnostics诊断

Test code测试代码

Observations观察

What we have tried我们尝试过的

EDITS编辑

1 个解决方案

解决方案1 0 2020-09-08 15:36:25

解决方案1
0 2020-09-08 15:36:25