是否收集了未使用的JPA实体垃圾，为什么？

Question

Building a Spring application that fetches data from web using an API I bumped multiple times into OutOfMemoryError: GC overhead limit exceeded . 构建一个使用API从Web获取数据的Spring应用程序，我多次遇到OutOfMemoryError: GC overhead limit exceeded 。 After some profiling sessions I started to question my model, which is something like this: 经过一些分析会议后，我开始质疑我的模型，这是这样的：

@Entity
class A {
  @Id
  private Integer id;
  private String name;

  @OneToMany
  private Set<B> b1;

  @OneToMany
  private Set<B> b2;
}

@Entity
Class B {
  @Id
  private Integer id;

  @ManyToOne
  private A a1;

  @ManyToOne
  private A a2;
}

There is a CrudRepository assigned to manage these entities (JPA + EclipseLink). 分配了一个CrudRepository来管理这些实体（JPA + EclipseLink）。 Entity loading is default, which in this case means eager AFAIK. 实体加载是默认的，在这种情况下意味着急切的AFAIK。

The program attempts to do the following: 该程序尝试执行以下操作：

// populates the set with 2500 A instances.
Set<A> aCollection = fetchAFromWebAPI();
for (A a : aCollection) {
  // populates b1 and b2 of each A with a 100 of B instances
  fetchBFromWebAPI(a);
  aRepository.save(a);
}

By the end of this process there would be 500k B instances, except it never reaches the end because of OutOfMemoryError: GC overhead limit exceeded . 在此过程结束时，将有500k个B实例，但由于OutOfMemoryError: GC overhead limit exceeded ，它永远不会到达终点OutOfMemoryError: GC overhead limit exceeded 。 Now I could add more memory, but I want to understand why all these instances aren't garbage collected? 现在我可以添加更多内存，但我想了解为什么所有这些实例都不是垃圾回收？ Save an A to the database and forget it. 将A保存到数据库并忘记它。 Is this because A instances have B instances in their b1 or b2 that in their turn reference A instances? 这是因为A实例在其b1或b2中有B实例，而这些实例又引用了A实例吗？

Another observation I made is that the process runs significantly more smoothly for the first time, when there is no data in database. 我做的另一个观察是，当数据库中没有数据时，该过程第一次运行得非常顺畅。

Is there something fundamentally wrong with this model or this process? 这个模型或这个过程有什么根本的错误吗？

Answer 1

A JPA transaction has an associated session cache of all entities used in the transaction. JPA事务具有事务中使用的所有实体的关联会话高速缓存。 By saving your entities you keep introducing more instances into that session cache. 通过保存实体，您可以在该会话缓存中引入更多实例。 In your case I'd recommend to use EntityManager.clear() every n entities - that detaches the persisted entities from the session and makes them available for garbage collection. 在您的情况下，我建议每隔n实体使用EntityManager.clear() - 将持久化实体与会话分离，并使它们可用于垃圾回收。

If you want to learn more about the lifecycle of JPA entities you can refer to eg 如果您想了解有关JPA实体生命周期的更多信息，请参阅例如

http://www.objectdb.com/java/jpa/persistence/managed http://www.objectdb.com/java/jpa/persistence/managed

Edit: Additionally the answer of BatScream also is correct: you seem to accumulate more and more data in every iteration that is still referenced by the set. 编辑：此外，BatScream的答案也是正确的：您似乎在集合仍然引用的每次迭代中累积越来越多的数据。 You might want to consider to remove instances you have processed from the set. 您可能需要考虑从集合中删除已处理的实例。

Answer 2

The collection aCollection keeps on growing after each iteration. 每次迭代后，集合aCollection都在不断增长。 Each instance of A will be populated with 200 entries of B instances after each loop. 的每个实例A将与200项来填充B每个循环之后的实例。 Hence your heap space gets eaten up. 因此你的堆空间被吃掉了。

All the A instances in the collection aCollection are always reachable when the garbage collector runs during this period, since you are not removing the just saved A from the collection. 由于您没有从集合中删除刚刚保存的A ，因此在此期间垃圾收集器运行时，集合aCollection中的所有A实例始终可以访问。

To avoid this, you can use the Set Iterator to safely remove the just processed A instance from the collection. 为避免这种情况，您可以使用Set Iterator从集合中安全地删除刚刚处理的A实例。

是否收集了未使用的JPA实体垃圾，为什么？

问题描述

2 个解决方案

解决方案1
4 2014-11-28 12:03:07

解决方案2
2 2014-11-28 12:01:21

是否收集了未使用的JPA实体垃圾，为什么？

问题描述

2 个解决方案

解决方案1 4 2014-11-28 12:03:07

解决方案2 2 2014-11-28 12:01:21

解决方案1
4 2014-11-28 12:03:07

解决方案2
2 2014-11-28 12:01:21