简体繁体 English

Spring Data JPA 垃圾收集

[英]Spring Data JPA garbage collection

原文 2019-08-20 19:04:48 1 1 spring-boot/ docker/ spring-data-jpa/ garbage-collection

I have a Spring Batch application with JpaPagingItemReader (i modified it a bit) and 4 Jpa repositories to enrich Model which comes from JpaPagingItemReader.我有一个带有JpaPagingItemReader的Spring Batch应用程序（我JpaPagingItemReader修改了它）和 4 个 Jpa 存储库来丰富来自 JpaPagingItemReader 的Model 。

My flow is:我的流程是：

Select Model (page size = 8192 ), then i collect this List<Model> to Map<String, List<Model>> (group by id, because models not unique and i need to enrich by id) then enrich it with 4 custom JpaRepositories with native queries with IN clause, and merge them with Java 8 Streams .选择Model （页面大小 = 8192 ），然后我将此List<Model> to Map<String, List<Model>>收集List<Model> to Map<String, List<Model>> （按 id 分组，因为模型不是唯一的，我需要按 id 充实）然后用 4 个自定义来丰富它JpaRepositories与带有 IN 子句的本机查询，并将它们与Java 8 Streams合并。
Convert data to XML object and with Stax writing with MultiFileItemWriter to files, which are splitted no more than 20000 per file.将数据转换为XML对象，并使用MultiFileItemWriter将 Stax 写入文件，每个文件的拆分不超过20000 。

All works great, but today i tried to run flow with big amount of data from database.一切都很好，但今天我尝试使用来自数据库的大量数据运行流。 I generated 20 files ( 2.2 GB ).我生成了 20 个文件（ 2.2 GB ）。 But sometimes i got OutOfMemory Java Heap (I had 1Gb XMS, XSS), then i up it to 2 GB and all works good, but in Instana i see, that Old gen Java memory is always 900 in use after GC.但有时我会得到OutOfMemory Java Heap （我有 1Gb XMS、XSS），然后我将其增加到 2 GB 并且一切正常，但是在 Instana 中我看到，在 GC 之后， Old gen Java memory始终使用900 。 It is about 1.3-1.7Gb in use.使用中大约为 1.3-1.7Gb。 So i start to think, how can i optimize GC of Spring Data Jpa objects.所以我开始思考，如何优化 Spring Data Jpa 对象的 GC。 I think they are much time in memory.我认为他们有很多时间在记忆中。 When i select Model with JpaPagingItemReader i detach every Model (with entityManager.detach ), but when i enrich Model with custom Spring Data Jpa requests i am not detaching results.当我使用JpaPagingItemReader选择模型时，我会分离每个模型（使用entityManager.detach ），但是当我使用自定义Spring Data Jpa请求丰富Model ，我不会分离结果。 Maybe the problem in this and i should detach them?也许是这个问题，我应该把它们分开？

I do not need to insert data to database, i need just to read it.我不需要将数据插入数据库，我只需要读取它。 Or do i need to make page size less and select about 4000 per request?或者我是否需要缩小页面大小并为每个请求选择大约4000个？

I need to process 370 000 records from database and enrich them.我需要从数据库中处理370 000条记录并丰富它们。