简体繁体 English

批量处理Spring批次总大小未知

[英]Chunk Processing Spring batch total size unknown

原文 2015-06-30 23:08:53 2 1 java/ spring/ parallel-processing/ spring-batch

I am reading data from input source which contains timestamp.My query to the source is based on time range on timestamp. 我正在从包含时间戳的输入源中读取数据。我对源的查询基于时间戳上的时间范围。

Lets say time range is 1 minute. 可以说时间范围是1分钟。 My reader reads records in 1 minute range and passes that to processor. 我的阅读器读取1分钟范围内的记录，并将其传递给处理器。 The processor can process only 100 records at a time. 处理器一次只能处理100条记录。 The reader should keep calling processor with chunk of 100 records until all the records are exhausted for that minute. 读取器应继续向处理器调用100条记录，直到该分钟内所有记录都用完为止。 After that writer should be triggered, 在触发那个作者之后，

How should I configure spring batch to achieve this ? 我应该如何配置spring batch来实现这一目标？

1 个解决方案

Your terminology is not entirely correct with respect to the inner workings of Spring Batch. 关于Spring Batch的内部运作，您的术语并不完全正确。 To process items from a reader through an optional processor to a writer Spring Batch uses chunks. 为了从读取器通过可选处理器到写入器处理项目，Spring Batch使用块。 Spring Batch reads your items from your ItemReader until NULL is returned by the reader (which indicates that the input stream is exhausted). Spring Batch从ItemReader读取项目，直到读取器返回NULL（这表明输入流已用尽）。 It then optionally processes these items by calling an ItemProcessor and finally writes the items using your ItemWriter. 然后，它有选择地通过调用ItemProcessor处理这些项目，最后使用ItemWriter写入这些项目。

The chunk handling is configured using a chunk size, which means the read and processed items are written in chunks of the given chunk size. 使用块大小配置块处理，这意味着读取和处理的项目将以给定块大小的块写入。

With this in mind regarding your question: 考虑到您的问题：

You should configure your reader to read items until your configured time slot is up. 您应该将阅读器配置为读取项目，直到配置的时间到了。 You then return NULL to indicate that all is read. 然后，您返回NULL表示已全部读取。
You should set the chunk size to 100 to indicate that the writer should be called every 100 items. 您应该将块大小设置为100，以指示应每100个项目调用一次编写器。

Keep in mind that ending the input stream using NULL return in the reader terminates the job with exit status SUCCESS. 请记住，读取器中使用NULL return结束输入流会以退出状态SUCCESS终止作业。 Spring Batch will NOT allow you to start this specific Job instance again unless you take specific measures to allow this. Spring Batch将不允许您再次启动此特定的Job实例，除非您采取特定措施允许这样做。 So, if there are in fact more items to process you should configure your Job using some kind of JobParametersIncrementer or a dummy timestamp parameter. 因此，如果实际上还有更多项目要处理，则应使用某种JobParametersIncrementer或虚拟时间戳参数配置Job。

Another possibility (and my personal preference) is not returning NULL from the reader but throw some exception, eg MoreItemsInInputStreamException or TimeRangeExceededException. 另一个可能性（也是我的个人喜好）不是从阅读器返回NULL，而是引发一些异常，例如MoreItemsInInputStreamException或TimeRangeExceededException。 This way the Job instance FAILS so the Job operator is aware that more is to be done, which he can do by simply restart the Job with the same parameters. 这样，Job实例将失败，因此Job运算符会意识到还有很多工作要做，他可以通过简单地使用相同的参数重新启动Job来完成。

Steef Steef