简体   繁体   English

Spring中TaskExecutor的实现批处理并行处理

[英]Implementation of TaskExecutor in Spring Batch for parallel processing

Consider a Step bean:考虑一个 Step bean:

@Bean
  public Step stepForChunkProcessing() {
    return stepBuilderFactory
        .get("stepForChunkProcessing")
        .<Entity1, Entity2>chunk(1000)
        .reader(reader())
        .processor(processor())
        .writer(writer())
        .taskExecutor(taskExecutor())
        .throttleLimit(10)
        .build();
  }
//@formatter:on

  @Bean
  public TaskExecutor taskExecutor(){
      return new SimpleAsyncTaskExecutor("MyApplication");
  }

Requirement: In Reader, it reads from records (of Entity1) from a File.要求:在 Reader 中,它从文件中读取(Entity1 的)记录。 In Processor, it processes and in Writer, it writes into the database.在处理器中,它处理和在写入器中,它写入数据库。

Before TaskExecutor, Only one thread was created and it would loop around in Reader and Processor for 1000 times as defined in chunk setting above.在 TaskExecutor 之前,只创建了一个线程,它会在 Reader 和 Processor 中循环 1000 次,如上面的块设置中定义的那样。 Then it would move to writer and writes all the 1000 records.然后它将转移到 writer 并写入所有 1000 条记录。 Again it would start from record number 1001 and then process another 1000 records in Reader and Processor.同样,它将从记录号 1001 开始,然后在 Reader 和 Processor 中处理另外 1000 条记录。 This is an synchronize execution.这是一个同步执行。

After TaskExecutor and the throttle limit as 10, 10 threads were created independent to each other.在 TaskExecutor 和油门限制为 10 之后,创建了 10 个彼此独立的线程。 How will they maintain the number of records from the file that are already processed by other threads?他们将如何维护文件中已经被其他线程处理的记录数? Also consider if I give synchronized keyword in the Read method of the reader, still how come the different threads will keep a check on already processed records from the file?还要考虑如果我在阅读器的 Read 方法中给出 synchronized 关键字,那么不同的线程怎么会检查文件中已处理的记录?

That's impossible in a multi-threaded environment, as mentioned in the Multi-threaded section of the reference documentation:如参考文档的多线程部分所述,这在多线程环境中是不可能的:

 Many participants in a Step (such as readers and writers) are stateful.
 If the state is not segregated by thread, then those components are not
 usable in a multi-threaded Step

That's why the documentation mentions to turn off state management on the javadoc of AbstractItemCountingItemStreamItemReader#setSaveState , here is an excerpt:这就是文档提到在AbstractItemCountingItemStreamItemReader#setSaveState的 javadoc 上关闭 state 管理的原因,这里摘录:

Always set it to false if the reader is being used in a concurrent environment.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM