简体   繁体   中英

Step initialization time too long using Partitioner in Spring-Batch?

I'm using Partitioner to parallelize the import of *.csv files. There are about 30k files in the folder.

Problem: the job initialization takes about 1-2h hours until all files are set up. The bottleneck is in SimpleStepExecutionSplitter.split() .

Question: is it normal that the step initializations require that much time? Or could I improve it somehow?

@Bean
public Step partitionStep(Partitioner partitioner) {
    return stepBuilderFactory.get("partitionStep")
            .partitioner(step())
            .partitioner("partitioner", partitioner)
            .taskExecutor(taskExecutor())
            .build();
}

@Bean
public TaskExecutor taskExecutor() {
    ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
    taskExecutor.setCorePoolSize(4); //run import always with 4 parallel files
    taskExecutor.setMaxPoolSize(4);
    taskExecutor.afterPropertiesSet();
    return taskExecutor;
}


@Bean
public Partitioner partitioner() throws IOException {
    MultiResourcePartitioner p = new MultiResourcePartitioner();
    p.setResources(new PathMatchingResourcePatternResolver().getResources("mypath/*.csv"));
    return p;
}

MultiResourcePartitioner creates a partition for each resource. Partition creation process in itself is very fast ( ie partitioner returns the executioncontext map very fast) but Spring Batch takes huge time in populating corresponding meta data DB tables and it becomes terribly slow once number of partitions goes beyond 100 ( this is all my personal experience).

As per only answer here , they did some improvements but I am using latest version and its very slow for partitions more than 100.

See this too.

I think, you don't have much of a choice other than reducing number of partitions unless you are ready to rewrite a bunch of API code by yourself.

I use a custom splitter because in the default splitter ( https://github.com/spring-projects/spring-batch/blob/master/spring-batch-core/src/main/java/org/springframework/batch/core/partition/support/SimpleStepExecutionSplitter.java ) , you call jobRepository.getLastStepExecution for each StepExecution . I don't use restartability with spring-batch, so i can write my own splitter. Now step initialization takes few seconds for thousand of files (before it was few minutes)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM