简体   繁体   English

Jpa阅读器Spring Batch

[英]Jpa reader Spring Batch

I would like to know, if this way is recommended to implement the reader spring batch with jpa or is it better to look for another solution and if this way is not recommended where can I look for information on a better option我想知道,是否建议使用这种方式来使用 jpa 实现 reader spring 批处理,或者寻找另一种解决方案是否更好,如果不建议使用这种方式,我在哪里可以找到有关更好选择的信息

public class CreditCardItemReader implements ItemReader<CreditCard> {

@Autowired
private CreditCardRepository respository;

private Iterator<CreditCard> usersIterator;

@BeforeStep
public void before(StepExecution stepExecution) {
    usersIterator = respository.someQuery().iterator();
}

@Override
public CreditCard read() {
    if (usersIterator != null && usersIterator.hasNext()) {
        return usersIterator.next();
    } else {
        return null;
    }
}
  }

This implementation is acceptable only for the small dataset because data is read by one batch query, and stored whole result list in memory.此实现仅适用于小型数据集,因为数据是通过一个批量查询读取的,并将整个结果列表存储在内存中。 Also, it is not thread-safe.此外,它不是线程安全的。
In the case of loading large volumes:在加载大容量的情况下:

  • on the environment with limited memory can lead to out of memory在内存有限的环境下会导致内存不足
  • can lead to performance problems.可能导致性能问题。 We will wait until thousands of records will be loaded from DB by one call我们将等到通过一次调用从数据库加载数千条记录


Solution 1, org.springframework.batch.item.database.JpaCursorItemReader解决方案一、 org.springframework.batch.item.database.JpaCursorItemReader
A similar implementation is defined out of the box in Spring Batch: JpaCursorItemReader Spring Batch 中开箱即用地定义了类似的实现: JpaCursorItemReader
The main difference is that this implementation is working only with specific JPQL query instead of repository and use JPA's Query.getResultStream() method to get query results.主要区别在于此实现仅适用于特定的 JPQL 查询而不是存储库,并使用 JPA 的Query.getResultStream()方法来获取查询结果。
Implementation of JpaCursorItemReader : JpaCursorItemReader的实现:

    protected void doOpen() throws Exception {
        ...
        Query query = createQuery();
        if (this.parameterValues != null) {
            this.parameterValues.forEach(query::setParameter);
        }
        this.iterator = query.getResultStream().iterator();
    }

Hibernate, for example, introduced the Query.getResultStream() method in version 5.2.例如,Hibernate 在 5.2 版本中引入了Query.getResultStream()方法。 It uses Hibernate's ScrollableResult implementation to move through the result set and to fetch the records in batches.它使用 Hibernate 的ScrollableResult实现来移动结果集并批量获取记录。 That prevents you from loading all records of the result set at once and allows you to process them more efficiently.这可以防止您一次加载结果集的所有记录,并允许您更有效地处理它们。
Example of creation:创建示例:

    protected ItemReader<Foo> getItemReader() throws Exception {
        LocalContainerEntityManagerFactoryBean factoryBean = new LocalContainerEntityManagerFactoryBean();
        String jpqlQuery = "from Foo";
        JpaCursorItemReader<Foo> itemReader = new JpaCursorItemReader<>();
        itemReader.setQueryString(jpqlQuery);
        itemReader.setEntityManagerFactory(factoryBean.getObject());
        itemReader.afterPropertiesSet();
        itemReader.setSaveState(true);
        return itemReader;
    }

Solution 2, org.springframework.batch.item.database.JpaPagingItemReader方案二、 org.springframework.batch.item.database.JpaPagingItemReader
It is more flexible solution for JPQL query than JpaCursorItemReader .它是 JPQL 查询比JpaCursorItemReader更灵活的解决方案。 ItemReader loads and stores data by pages and it is thread-safe. ItemReader 按页面加载和存储数据,它是线程安全的。
According to documentation:根据文档:

ItemReader for reading database records built on top of JPA. ItemReader 用于读取建立在 JPA 之上的数据库记录。

It executes the JPQL setQueryString(String) to retrieve requested data.它执行 JPQL setQueryString(String) 以检索请求的数据。 The query is executed using paged requests of a size specified in AbstractPagingItemReader.setPageSize(int).使用 AbstractPagingItemReader.setPageSize(int) 中指定大小的分页请求执行查询。 Additional pages are requested when needed as AbstractItemCountingItemStreamItemReader.read() method is called, returning an object corresponding to current position.当调用 AbstractItemCountingItemStreamItemReader.read() 方法时,会在需要时请求其他页面,返回与当前位置对应的对象。

The performance of the paging depends on the JPA implementation and its use of database specific features to limit the number of returned rows.分页的性能取决于 JPA 实现及其使用数据库特定功能来限制返回的行数。

Setting a fairly large page size and using a commit interval that matches the page size should provide better performance.设置相当大的页面大小并使用与页面大小匹配的提交间隔应该会提供更好的性能。

In order to reduce the memory usage for large results the persistence context is flushed and cleared after each page is read.为了减少大型结果的内存使用量,在读取每个页面后都会刷新和清除持久性上下文。 This causes any entities read to be detached.这会导致任何读取的实体被分离。 If you make changes to the entities and want the changes persisted then you must explicitly merge the entities.如果您对实体进行更改并希望保留更改,则必须显式合并实体。

The implementation is thread-safe in between calls该实现在调用之间是线程安全的

Solution 3, org.springframework.batch.item.data.RepositoryItemReader方案三、 org.springframework.batch.item.data.RepositoryItemReader
It is a more efficient solution.这是一种更有效的解决方案。 It works with the repository, loads and stores data in chunks and it is thread-safe.它与存储库一起工作,以块的形式加载和存储数据,并且是线程安全的。
According to documentation:根据文档:

A ItemReader that reads records utilizing a PagingAndSortingRepository.使用 PagingAndSortingRepository 读取记录的 ItemReader。

Performance of the reader is dependent on the repository implementation, however setting a reasonably large page size and matching that to the commit interval should yield better performance.阅读器的性能取决于存储库的实现,但是设置一个相当大的页面大小并将其与提交间隔匹配应该会产生更好的性能。

The reader must be configured with a PagingAndSortingRepository, a Sort, and a pageSize greater than 0.阅读器必须配置一个 PagingAndSortingRepository、一个 Sort 和一个大于 0 的 pageSize。

This implementation is thread-safe between calls to AbstractItemCountingItemStreamItemReader.open(ExecutionContext), but remember to use saveState=false if used in a multi-threaded client (no restart available).此实现在对 AbstractItemCountingItemStreamItemReader.open(ExecutionContext) 的调用之间是线程安全的,但如果在多线程客户端中使用(没有重新启动可用),请记住使用 saveState=false。

Example of creation:创建示例:

PagingAndSortingRepository<Foo, Long> repository = FooRepository<>();
RepositoryItemReader<Foo> reader = new RepositoryItemReader<>();
reader.setRepository(repository ); //The PagingAndSortingRepository implementation used to read input from.
reader.setMethodName("findByName"); //Specifies what method on the repository to call.
reader.setArguments(arguments); // Arguments to be passed to the data providing method.

Creation via builder:通过构建器创建:

PagingAndSortingRepository<Foo, Long> repository = new FooRepository<>();
new RepositoryItemReaderBuilder<>().repository(repository)
                                   .methodName("findByName")
                                   .arguments(new ArrayList<>())
                                   .build()

More examples of usage: RepositoryItemReaderTests and RepositoryItemReaderIntegrationTests更多使用示例: RepositoryItemReaderTestsRepositoryItemReaderIntegrationTests

Summarise:总结:
Your implementation is good only for simple use cases.您的实现仅适用于简单的用例。
I recommend to use out of box solutions.我建议使用开箱即用的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM