I'm using Partitioner
to parallelize the import of *.csv
files. There are about 30k files in the folder.
Problem: the job initialization takes about 1-2h hours until all files are set up. The bottleneck is in SimpleStepExecutionSplitter.split()
.
Question: is it normal that the step initializations require that much time? Or could I improve it somehow?
@Bean
public Step partitionStep(Partitioner partitioner) {
return stepBuilderFactory.get("partitionStep")
.partitioner(step())
.partitioner("partitioner", partitioner)
.taskExecutor(taskExecutor())
.build();
}
@Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(4); //run import always with 4 parallel files
taskExecutor.setMaxPoolSize(4);
taskExecutor.afterPropertiesSet();
return taskExecutor;
}
@Bean
public Partitioner partitioner() throws IOException {
MultiResourcePartitioner p = new MultiResourcePartitioner();
p.setResources(new PathMatchingResourcePatternResolver().getResources("mypath/*.csv"));
return p;
}
MultiResourcePartitioner
creates a partition for each resource. Partition creation process in itself is very fast ( ie partitioner returns the executioncontext map very fast) but Spring Batch takes huge time in populating corresponding meta data DB tables and it becomes terribly slow once number of partitions goes beyond 100 ( this is all my personal experience).
As per only answer here , they did some improvements but I am using latest version and its very slow for partitions more than 100.
See this too.
I think, you don't have much of a choice other than reducing number of partitions unless you are ready to rewrite a bunch of API code by yourself.
I use a custom splitter because in the default splitter ( https://github.com/spring-projects/spring-batch/blob/master/spring-batch-core/src/main/java/org/springframework/batch/core/partition/support/SimpleStepExecutionSplitter.java ) , you call jobRepository.getLastStepExecution
for each StepExecution
. I don't use restartability with spring-batch, so i can write my own splitter. Now step initialization takes few seconds for thousand of files (before it was few minutes)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.