简体   繁体   English

Spring 批处理步骤未读取完整文件

[英]Spring batch Step does not read full file

Hi I have a problem with Spring Batch, I create a Job with two step the first step read a csv file by chunks filter bad values and saves into db, and second call to a stored procedure.嗨,我对 Spring 批处理有问题,我通过两步创建一个作业,第一步读取 csv 文件,通过块过滤错误值并保存到数据库中,然后第二次调用存储过程。

My problem is that for some reason the first step only reads partially the data file a 2,5GB csv.我的问题是,由于某种原因,第一步仅读取部分数据文件 2,5GB csv。

The file have about 13M records but only saves about 400K.该文件有大约 13M 的记录,但只保存了大约 400K。

Anybody knows why this happens and how to solve it?任何人都知道为什么会发生这种情况以及如何解决它?

Java version: 8 Java版本: 8

Spring boot version 2.7.1 Spring开机版2.7.1

This is my step这是我的一步

    @Autowired
    @Bean(name = "load_data_in_db_step")
    public Step importData(
            MyProcessor processor,
            MyReader reader,
            TaskExecutor executor,
            @Qualifier("step-transaction-manager") PlatformTransactionManager transactionManager
    ) {
        return stepFactory.get("experian_portals_imports")
                .<ExperianPortal, ExperianPortal>chunk(chunkSize)
                .reader(reader)
                .processor(processor)
                .writer(new JpaItemWriterBuilder<ExperianPortal>()
                        .entityManagerFactory(factory)
                        .usePersist(true)
                        .build()
                )
                .transactionManager(transactionManager)
                .allowStartIfComplete(true)
                .taskExecutor(executor)
                .build();
    }

this is the definition of MyReader这是 MyReader 的定义

@Slf4j
@Component
public class MyReader extends FlatFileItemReader<ExperianPortal>{
    private final MyLineMapper mapper;
    private final Resource fileToRead;

    @Autowired
    public ExperianPortalReader(
            MyLineMapper mapper,
            @Value("${ext.datafile}") String pathToDataFile
    ) {
        this.mapper = mapper;
        val formatter = DateTimeFormatter.ofPattern("yyyyMM");
        fileToRead = new FileSystemResource(String.format(pathToDataFile, formatter.format(LocalDate.now())));
    }

    @Override
    public void afterPropertiesSet() throws Exception {
        setLineMapper(mapper);
        setEncoding(StandardCharsets.ISO_8859_1.name());
        setLinesToSkip(1);
        setResource(fileToRead);
        super.afterPropertiesSet();
    }


}

edit : I already try to use a single thread strategy, i think that can be a problem with the RepeatTemplate, but i don't know how to use it correctly.编辑:我已经尝试使用单线程策略,我认为这可能是 RepeatTemplate 的问题,但我不知道如何正确使用它。

edit 2 : I give up with a custom solution and I finished using default components they works ok, and the problem was solve.编辑 2 :我放弃了自定义解决方案,并使用了默认组件,它们工作正常,问题得到解决。

Remember to use only spring batch components记住只使用 spring 批量组件

This is because you are using a non thread-safe item reader in a multi-threaded step.这是因为您在多线程步骤中使用了非线程安全的项目阅读器。 Your item reader extends FlatFileItemReader , and FlatFileItemReader is not thread-safe: Using FlatFileItemReader with a TaskExecutor (Thread Safety) .您的项目阅读器扩展了FlatFileItemReader ,而FlatFileItemReader不是线程安全的: 将 FlatFileItemReader 与 TaskExecutor (Thread Safety) 一起使用 You can try with a single threaded-step (remove .taskExecutor(executor) ) and you will see that the entire file will be read.您可以尝试使用单个线程步骤(删除.taskExecutor(executor) ),您会看到将读取整个文件。

What happens is that threads are reading records concurrently and the read count is not honored (threads are incrementing the read count and the step "thinks" that the file has been read entirely).发生的情况是线程正在同时读取记录并且读取计数不被遵守(线程正在增加读取计数并且步骤“认为”文件已被完全读取)。 You have a few options here:你有几个选择:

  • synchronize the call to read in your item reader同步在您的项目阅读器中read的调用
  • wrap your reader in a SynchronizedItemStreamReader (the result would the same as the previous point)将您的阅读器包装在SynchronizedItemStreamReader中(结果与前一点相同)
  • make your item reader bean step-scoped使您的项目阅读器 bean 步进范围

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM