简体   繁体   English

Spring数据如何在事务方法中清理持久化实体?

[英]How spring data clean persited entities in transactional method?

I need to receive and save huge amount of data using spring data over hibernate. 我需要通过hibernate上的spring数据接收和保存大量数据。 Our server allocated not enough RAM for persisting all entities at the same time. 我们的服务器分配的RAM不足以同时保留所有实体。 We will definitely get OutOfMemory error. 我们肯定会出现OutOfMemory错误。

So we need to save data by batches it's obvious. 所以我们需要批量保存数据,这是显而易见的。 Also we need to use @Transactional to be sure that all data persisted or non was persisted in case of even single error. 此外,我们需要使用@Transactional来确保所有数据都是持久的,或者在出现单个错误的情况下仍然存在。

So, the question: does spring data during @Transactional method keep storing entities in RAM or entities which were flushed are accessible to garbage collector? 那么,问题是:@Transactional方法中的spring数据是否会将实体存储在RAM中,或者被刷新的实体是否可以被垃圾收集器访问?

So, what is the best approach to process huge mount of data with spring data? 那么,使用弹簧数据处理大量数据的最佳方法是什么? Maybe spring data isn't right approach to solve problems like that. 也许Spring数据不是解决这类问题的正确方法。

Does spring data during @Transactional method keep storing entities in RAM or entities which were flushed are accessible to garbage collector? @Transactional方法中的spring数据是否在RAM中存储实体或者被刷新的实体是否可以被垃圾收集器访问?

The entities will keep storing in RAM (ie in entityManager ) until the transaction commit/rollback or the entityManager is cleared. 实体将保持存储在RAM中(即在entityManager ),直到事务提交/回滚或entityManager被清除。 That means the entities are only eligible for GC if the transaction commit/rollback or entityManager.clear() is called. 这意味着如果调用事务commit / rollback或entityManager.clear()则实体仅适用于GC。

So, what is the best approach to process huge mount of data with spring data? 那么,使用弹簧数据处理大量数据的最佳方法是什么?

The general strategy to prevent OOM is to load and process the data batch by batch . 防止OOM的一般策略是逐批加载和处理数据。 At the end of each batch , you should flush and clear the entityManager such that the entityManager can release its managed entities for CG. 在每个批处理结束时,您应该刷新并清除entityManager ,以便entityManager可以为CG释放其管理实体。 The general code flow should be something like this: 一般代码流应该是这样的:

@Component
public class BatchProcessor {

    //Spring will ensure this entityManager is the same as the one that start transaction due to  @Transactional
    @PersistenceContext
    private EntityManager em;

    @Autowired
    private FooRepository fooRepository;

    @Transactional
    public void startProcess(){

        processBatch(1,100);
        processBatch(101,200);
        processBatch(201,300);
        //blablabla

    }

    private void processBatch(int fromFooId , int toFooId){
        List<Foo> foos =  fooRepository.findFooIdBetween(fromFooId, toFooId);
        for(Foo foo :foos){
            //process a foo
        }

        /*****************************
        The reason to flush is send the update SQL to DB . 
        Otherwise ,the update will lost if we clear the entity manager 
        afterward.
        ******************************/
        em.flush();
        em.clear();
    }
} 

Note that this practise is only for preventing OOM but not for achieving high performance. 请注意,此做法仅用于防止OOM,但不用于实现高性能。 So if performance is not your concern , you can safely use this strategy. 因此,如果您不关心性能,则可以安全地使用此策略。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM