简体   繁体   中英

Executing chunks in parallel for a partitioned slave step

My this question is an extension to my another SO Question . Since that doesn't look possible, I am trying to execute chunks in parallel for parallel / partitioned slave steps.

Article says that by just specifying SimpleAsyncTaskExecutor as task executor for a step would start executing chunks in parallel.

@Bean
public Step masterLuceneIndexerStep() throws Exception{
        return stepBuilderFactory.get("masterLuceneIndexerStep")
                .partitioner(slaveLuceneIndexerStep())
                .partitioner("slaveLuceneIndexerStep", partitioner())
                .gridSize(Constants.PARTITIONER_GRID_SIZE)
                .taskExecutor(simpleAsyntaskExecutor)
                .build();
    }

    @Bean
    public Step slaveLuceneIndexerStep()throws Exception{
        return stepBuilderFactory.get("slaveLuceneIndexerStep")
                .<IndexerInputVO,IndexerOutputVO> chunk(Constants.INDEXER_STEP_CHUNK_SIZE)
                .reader(luceneIndexReader(null))
                .processor(luceneIndexProcessor())
                .writer(luceneIndexWriter(null))
                .listener(luceneIndexerStepListener)
                .listener(lichunkListener)
                .throttleLimit(Constants.THROTTLE_LIMIT)
                .build();
    }

If I specify, .taskExecutor(simpleAsyntaskExecutor) for slave step then job fails. Line .taskExecutor(simpleAsyntaskExecutor) in master step works OK but chunks work in serial and partitioned steps in parallel.

Is it possible to parallelize chunks of slaveLuceneIndexerStep() ?

Basically, each chunk is writing Lucene indices to a single directory in sequential fashion and I want to further parallelize index writing process within each directory since Lucene IndexWriter is thread-safe.

I am able to launch parallel chunks from within a partitioned slave step by following,

1.I first took care of my reader, processor and writer to be thread safe so that those components can participate in parallel chunks without concurrency issues.

2.I kept task executor as for master step as SimpleAsyntaskExecutor since slave steps are long running and I wish to start exactly N-threads at a point in time. I control N by setting concurrencyLimit of task executor.

3.Then I set a ThreadPoolTaskExecutor as task executor for slave step. This pool gets used by all slave steps as a common pool so I set its core pool size as a minimum of N so that each slave step gets at least one thread and starvation doesn't happen. You can increase this thread pool size as per system capacity and I used a thread pool since chunks are smaller running processes.

Using a thread pool also handles a specific case for my application that my partitioning is by client_id so when smaller clients are done same threads get automatically reused by bigger clients and asymmetry created by client_id partitioning gets handled since data to be processed for each client varies a lot.

Master step task executor simply starts all slave step threads and goes to WAITING state while slave step chunks get processed by thread pool specified in slave step.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM