繁体   English   中英

Spring批处理在XML读取错误后无法继续处理记录

[英]Spring batch does not continue processing records after xml reading error

我配置了spring批处理,以在读取xml文件时出错时跳过不良记录。 skipPolicy实现始终返回true,以跳过错误的记录。 该工作需要继续处理其余记录,但是在我的情况下,它会在不良记录完成后停止。

@Configuration
@Import(DataSourceConfig.class)
@EnableWebMvc
@ComponentScan(basePackages = "org.nova.batch")
@EnableBatchProcessing
public class BatchIssueConfiguration {
private static final Logger LOG =LoggerFactory.getLogger(BatchIssueConfiguration.class);
    @Autowired
    private JobBuilderFactory jobBuilderFactory;
    @Autowired
    private StepBuilderFactory stepBuilderFactory;

    @Bean(name = "jobRepository")
    public JobRepository jobRepository(DataSource dataSource, PlatformTransactionManager transactionManager) throws Exception {
        JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
        factory.setDatabaseType("derby");
        factory.setDataSource(dataSource);
        factory.setTransactionManager(transactionManager);
        return factory.getObject();
    }
    @Bean
    public Step stepSGR() throws IOException{
        return stepBuilderFactory.get("ETL_STEP").<SigmodRecord.Issue,SigmodRecord.Issue>chunk(1)
                //.processor(itemProcessor())
                .writer(itemWriter())
                .reader(multiReader())
                .faultTolerant()
                .skipLimit(Integer.MAX_VALUE)
                .skipPolicy(new FileVerificationSkipper())
                .skip(Throwable.class)
                .build();
    }

    @Bean
    public SkipPolicy   fileVerificationSkipper(){
        return new FileVerificationSkipper();
    }


    @Bean
    @JobScope
    public MultiResourceItemReader<SigmodRecord.Issue> multiReader() throws IOException{
        MultiResourceItemReader<SigmodRecord.Issue> mrir = new MultiResourceItemReader<SigmodRecord.Issue>();
        //FileSystemResource [] files = new FileSystemResource [{}];
        ResourcePatternResolver rpr = new PathMatchingResourcePatternResolver();
        Resource[] resources = rpr.getResources("file:c:/temp/Sigm*.xml");
        mrir.setResources( resources);
        mrir.setDelegate(xmlItemReader());
        return mrir;
    }
}

public class FileVerificationSkipper implements SkipPolicy {

    private static final Logger LOG = LoggerFactory.getLogger(FileVerificationSkipper.class);

    @Override
    public boolean shouldSkip(Throwable t, int skipCount) throws SkipLimitExceededException {
        LOG.error("There is an error {}",t);
        return true;
    }

}

该文件具有包含导致读取错误的“&”的输入,即

<title>Notes of DDTS & n Apparatus for Experimental Research</title>

这将引发以下错误:

org.springframework.dao.DataAccessResourceFailureException: Error reading XML stream; nested exception is javax.xml.stream.XMLStreamException: ParseError at [row,col]:[127,25]
Message: The entity name must immediately follow the '&' in the entity reference.

我在配置中是否做错了什么,不允许其余的记录继续处理。

要跳过某些类型的异常,我们可以提及跳过策略,在该策略中我们可以编写用于跳过异常的自定义逻辑。 像下面的代码。

        @Bean
            public Step stepSGR() throws IOException{
                return stepBuilderFactory.get("ETL_STEP").<SigmodRecord.Issue,SigmodRecord.Issue>chunk(1)
                        //.processor(itemProcessor())
                        .writer(itemWriter())
                        .reader(multiReader())
                        .faultTolerant()
                        .skipPolicy(new FileVerificationSkipper())
                        .build();
            }

        public class FileVerificationSkipper implements SkipPolicy {

        private static final Logger LOG = LoggerFactory.getLogger(FileVerificationSkipper.class);

        @Override
        public boolean shouldSkip(Throwable t, int skipCount) throws SkipLimitExceededException {
            LOG.error("There is an error {}",t);
            if (t instanceof DataAccessResourceFailureException)          
              return true;
        }

    }

或者,您可以像下面这样简单地进行设置。

     @Bean
     public Step stepSGR() throws IOException{
       return stepBuilderFactory.get("ETL_STEP").<SigmodRecord.Issue,SigmodRecord.Issue>chunk(1)
                        //.processor(itemProcessor())
                        .writer(itemWriter())
                        .reader(multiReader())
                        .faultTolerant()
                        .skipLimit(Integer.MAX_VALUE)
                        .skip(DataAccessResourceFailureException.class)
                        .build();
            }

这个问题属于格式错误的xml,看来除了修复xml本身之外,没有其他方法可以恢复。 春天的StaxEventItemReader在xml的低层解析中使用XMLEventReader,因此我尝试使用XMLEventReader读取xml文件以尝试跳过坏块,但是XMLEventReader.nextEvent()始终在坏块所在的位置抛出异常。 我试图在try catch块中处理该事件,以跳到下一个事件,但是看来读者不会移动到下一个事件。 因此,目前解决此问题的唯一方法是在处理XML之前先对其进行修复。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM