简体   繁体   English

并行执行块以进行分区的从属步骤

[英]Executing chunks in parallel for a partitioned slave step

My this question is an extension to my another SO Question . 我的这个问题是我另一个SO问题的扩展。 Since that doesn't look possible, I am trying to execute chunks in parallel for parallel / partitioned slave steps. 由于这似乎不可能,因此我尝试为并行/分区从属步骤并行执行块。

Article says that by just specifying SimpleAsyncTaskExecutor as task executor for a step would start executing chunks in parallel. 文章说,仅将SimpleAsyncTaskExecutor指定为步骤的任务执行器,就会开始并行执行块。

@Bean
public Step masterLuceneIndexerStep() throws Exception{
        return stepBuilderFactory.get("masterLuceneIndexerStep")
                .partitioner(slaveLuceneIndexerStep())
                .partitioner("slaveLuceneIndexerStep", partitioner())
                .gridSize(Constants.PARTITIONER_GRID_SIZE)
                .taskExecutor(simpleAsyntaskExecutor)
                .build();
    }

    @Bean
    public Step slaveLuceneIndexerStep()throws Exception{
        return stepBuilderFactory.get("slaveLuceneIndexerStep")
                .<IndexerInputVO,IndexerOutputVO> chunk(Constants.INDEXER_STEP_CHUNK_SIZE)
                .reader(luceneIndexReader(null))
                .processor(luceneIndexProcessor())
                .writer(luceneIndexWriter(null))
                .listener(luceneIndexerStepListener)
                .listener(lichunkListener)
                .throttleLimit(Constants.THROTTLE_LIMIT)
                .build();
    }

If I specify, .taskExecutor(simpleAsyntaskExecutor) for slave step then job fails. 如果我为从属步骤指定.taskExecutor(simpleAsyntaskExecutor) ,则作业将失败。 Line .taskExecutor(simpleAsyntaskExecutor) in master step works OK but chunks work in serial and partitioned steps in parallel. 主步骤中的行.taskExecutor(simpleAsyntaskExecutor)可以正常运行,但是块在串行和分区步骤中并行运行。

Is it possible to parallelize chunks of slaveLuceneIndexerStep() ? 是否可以并行化slaveLuceneIndexerStep()块?

Basically, each chunk is writing Lucene indices to a single directory in sequential fashion and I want to further parallelize index writing process within each directory since Lucene IndexWriter is thread-safe. 基本上,每个块都以顺序方式将Lucene索引写入单个目录,由于Lucene IndexWriter是线程安全的,因此我想进一步并行化每个目录中的索引写入过程。

I am able to launch parallel chunks from within a partitioned slave step by following, 我可以按照以下步骤从分区的从属步骤中启动并行块,

1.I first took care of my reader, processor and writer to be thread safe so that those components can participate in parallel chunks without concurrency issues. 1.我首先要照顾好我的读取器,处理器和写入器,使其具有线程安全性,以便那些组件可以参与并行块而不会出现并发问题。

2.I kept task executor as for master step as SimpleAsyntaskExecutor since slave steps are long running and I wish to start exactly N-threads at a point in time. 2.我将任务执行程序作为SimpleAsyntaskExecutor保留在主步骤中,因为从属步骤运行很长时间,并且我希望在某个时间点精确启动N个线程。 I control N by setting concurrencyLimit of task executor. 我通过设置任务执行器的concurrencyLimit来控制N。

3.Then I set a ThreadPoolTaskExecutor as task executor for slave step. 3.然后,我将ThreadPoolTaskExecutor设置为从属步骤的任务执行器。 This pool gets used by all slave steps as a common pool so I set its core pool size as a minimum of N so that each slave step gets at least one thread and starvation doesn't happen. 所有从属步骤都将该池用作公共池,因此我将其核心池大小设置为N的最小值,以便每个从属步骤至少获得一个线程,并且不会发生饥饿。 You can increase this thread pool size as per system capacity and I used a thread pool since chunks are smaller running processes. 您可以根据系统容量增加此线程池的大小,并且由于块是正在运行的较小进程,因此我使用了线程池。

Using a thread pool also handles a specific case for my application that my partitioning is by client_id so when smaller clients are done same threads get automatically reused by bigger clients and asymmetry created by client_id partitioning gets handled since data to be processed for each client varies a lot. 使用线程池还可以为我的应用程序处理一个特殊情况,即我的分区是通过client_id因此当较小的客户端完成操作时,较大的客户端会自动重用相同的线程,并且由于每个客户端要处理的数据不同很多。

Master step task executor simply starts all slave step threads and goes to WAITING state while slave step chunks get processed by thread pool specified in slave step. 主步骤任务执行程序仅启动所有从步骤线程并进入WAITING状态,而从步骤块由从步骤中指定的线程池处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM