[英]Spring Batch dynamic chunk size based on the number of rows from a CSV without counting the header row
My application is a scheduled job runner with batch configurations.我的应用程序是具有批处理配置的计划作业运行器。
I can have CSV files with different number of rows, but I know that the first row will be always the header:我可以拥有具有不同行数的 CSV 文件,但我知道第一行将始终是 header:
id,firstName,lastName
1,Viktor,Someone
2,Joe,Smith
3,Rebecca,Harper
How should I set up the chunk to be dynamic?我应该如何将块设置为动态的? The file can contain 5, 10, or even 100000 rows.该文件可以包含 5、10 甚至 100000 行。
So instead of giving a big number to the chunk, I am looking for a better dynamic solution based on the number of the rows without counting the header row !因此,我不是给块一个很大的数字,而是在不计算 header 行的情况下根据行数寻找更好的动态解决方案!
@Bean
public Step step1() {
return stepBuilderFactory.get("step1").<Employee, Employee>chunk(100000)
.reader(reader())
.writer(writer())
.build();
}
The reader is a FlatFileItemReader
.阅读器是FlatFileItemReader
。
What about the following:以下情况如何:
@Bean
public Step step1() {
long lineNumberWithoutHeader = Files.lines(Paths.get("path to your file")).count() - 1;
int chunkSize = .. // calculate chunk size based on lineNumberWithoutHeader
return stepBuilderFactory.get("step1").<Employee, Employee>chunk(chunkSize)
.reader(reader())
.writer(writer())
.build();
}
You can refactor the code as needed (inject the file resource or late bind it from job parameters, extract the calculation logic in a separate method, etc), but you got the idea.您可以根据需要重构代码(注入文件资源或从作业参数后期绑定它,在单独的方法中提取计算逻辑等),但您明白了。
Another option would be to use a separate step that does the calculation and put it in the job execution context, then configure your chunk-oriented step with the value from the execution context.另一种选择是使用单独的步骤进行计算并将其放入作业执行上下文中,然后使用执行上下文中的值配置面向块的步骤。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.