[英]Spring Batch - Understanding the behaviour between chunk size and ItemReadListener
我已經使用java config在spring批處理中設置了一個簡單的讀取作業,並且我試圖編寫一個簡單的偵聽器。 偵聽器應顯示讀取特定數量的記錄所花費的時間(以秒為單位)。
Bean如下所示:
@Bean
public SimpleItemReaderListener listener(){
SimpleItemReaderListener listener = new SimpleItemReaderListener<>();
listener.setLogInterval(50000);
return listener;
}
根據設置的日志間隔,將顯示一條消息,並且該消息將如下所示:
14:42:11,445 INFO main SimpleItemReaderListener:45 - Read records [0] to [50.000] in average 1,30 seconds
14:42:14,453 INFO main SimpleItemReaderListener:45 - Read records [50.000] to [100.000] in average 2,47 seconds
14:42:15,489 INFO main SimpleItemReaderListener:45 - Read records [100.000] to [150.000] in average 1,03 seconds
14:42:16,448 INFO main SimpleItemReaderListener:45 - Read records [150.000] to [200.000] in average 0,44 seconds
正是我想要的,完美。 但是,當我將batchConfiguration中的塊從100.000更改為1.000時,日志記錄發生了變化,我不知道是什么導致了更改...
14:51:24,893 INFO main SimpleItemReaderListener:45 - Read records [0] to [50.000] in average 0,90 seconds
14:51:50,657 INFO main SimpleItemReaderListener:45 - Read records [50.000] to [100.000] in average 0,57 seconds
14:52:16,392 INFO main SimpleItemReaderListener:45 - Read records [100.000] to [150.000] in average 0,59 seconds
14:52:42,125 INFO main SimpleItemReaderListener:45 - Read records [150.000] to [200.000] in average 0,61 seconds
在給每個項目執行ItemReaderListener中的beforeRead和afterRead方法的印象下,我期望每50.000花費的時間與slf4j日志中顯示的時間更加一致(例如,大約26秒每個50.000)。
更改塊大小時,偵聽器的哪一部分會導致此不良行為?
我對ItemReadListener的實現如下:
public class SimpleItemReaderListener<Item> implements ItemReadListener<Item>{
private static final Logger LOG = LoggerFactory.getLogger(SimpleItemReaderListener.class);
private static final double NANO_TO_SECOND_DIVIDER_NUMBER = 1_000_000_000.0;
private static final String PATTERN = ",###";
private int startCount = 0;
private int logInterval = 50000;
private int currentCount;
private int totalCount;
private long timeElapsed;
private long startTime;
private DecimalFormat decimalFormat = new DecimalFormat(PATTERN);
@Override
public void beforeRead() {
startTime = System.nanoTime();
}
@Override
public void afterRead(Item item) {
updateTimeElapsed();
if (currentCount == logInterval) {
displayMessage();
updateStartCount();
resetCount();
} else {
increaseCount();
}
}
private void updateTimeElapsed() {
timeElapsed += System.nanoTime() - startTime;
}
private void displayMessage() {
LOG.info(String.format("Read records [%s] to [%s] in average %.2f seconds",
decimalFormat.format(startCount),
decimalFormat.format(totalCount),
timeElapsed / NANO_TO_SECOND_DIVIDER_NUMBER));
}
private void updateStartCount() {
startCount += currentCount;
}
private void resetCount() {
currentCount = 0;
timeElapsed = 0;
}
private void increaseCount() {
currentCount++;
totalCount++;
}
@Override
public void onReadError(Exception arg0) {
// NO-OP
}
public void setLogInterval(int logInterval){
this.logInterval = logInterval;
}
}
完整的batchconfiguration類:
@Configuration
@EnableBatchProcessing
public class BatchConfiguration {
@Autowired
public JobBuilderFactory jobBuilderFactory;
@Autowired
public StepBuilderFactory stepBuilderFactory;
@Bean
public Job importUserJob() {
return jobBuilderFactory.get("importUserJob")
.flow(validateInput())
.end()
.build();
}
@Bean
public Step validateInput() {
return stepBuilderFactory.get("validateInput")
.chunk(1000)
.reader(reader())
.listener(listener())
.writer(writer())
.build();
}
@Bean
public HeaderTokenizer tokenizeHeader(){
HeaderTokenizer tokenizer = new HeaderTokenizer();
//optional setting, custom delimiter is set to ','
//tokenizer.setDelimiter(",");
return tokenizer;
}
@Bean
public SimpleItemReaderListener listener(){
SimpleItemReaderListener listener = new SimpleItemReaderListener<>();
//optional setting, custom logging is set to 1000, increase for less verbose logging
listener.setLogInterval(50000);
return listener;
}
@Bean
public FlatFileItemReader reader() {
FlatFileItemReader reader = new FlatFileItemReader();
reader.setLinesToSkip(1);
reader.setSkippedLinesCallback(tokenizeHeader());
reader.setResource(new ClassPathResource("majestic_million.csv"));
reader.setLineMapper(new DefaultLineMapper() {{
setLineTokenizer(tokenizeHeader());
setFieldSetMapper(new PassThroughFieldSetMapper());
}});
return reader;
}
@Bean
public DummyItemWriter writer(){
DummyItemWriter writer = new DummyItemWriter();
return writer;
}
}
或者使用http://projects.spring.io/spring-batch/中的spring引導示例,並添加SimpleItemReaderListener bean。
當批處理量較小時,您的應用程序在閱讀器上花費了更多時間。 您的計時代碼僅測量花費在閱讀器上的時間,但是日志記錄框架顯示時間戳,這是總花費的時間。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.