简体   繁体   English

Spring批处理FlatfileitemReader读取格式错误的行

[英]Spring batch FlatfileitemReader to read malformed lines

In my project, I am using Spring batch and reading a file using FlatFileItemReader/FieldSetMapper. 在我的项目中,我正在使用Spring批处理,并使用FlatFileItemReader / FieldSetMapper读取文件。 There is problem with some input files.The lines are cut/malformed for few records. 某些输入文件有问题。行被剪切/格式错误,记录很少。
Assume the input file has 4 columns. 假设输入文件有4列。 few columns not formed properly. 几列格式不正确。 Can anyone please helpme in fixing this issue?(I could explain more if needed) 谁能帮我解决这个问题?(如果需要,我可以解释更多)
File.csv FILE.CSV

"id","name","age","salary"
"1","user1","28","1000"
"2","user2","27","2000"
"3","user3","26
    ","3000"
"4","user4","25","
    4000"
"5","
        user5","24","5000"
"6","user6","23","6000"
"7","user7","22","7000"
"8","user8","21","8000"

I had similar issue while reading malformed lines with FlatFileItemReader. 在使用FlatFileItemReader读取格式错误的行时,我遇到了类似的问题。 In this case, you can use a DefaultRecordSeparatorPolicy as a RecordSeparatorPolicy in FlatFileItemReader. 在这种情况下,可以将DefaultRecordSeparatorPolicy用作FlatFileItemReader中的RecordSeparatorPolicy。 What it does is it checks for endOfRecord after reading a line. 它的作用是在读取一行后检查endOfRecord。 If the read line has any uncommented quotes, it reads the another line to normalize the input. 如果读取的行中有任何未注释的引号,它将读取另一行以规范化输入。 You can also override the behavior. 您还可以覆盖行为。

flatFileItemReader.setRecordSeparatorPolicy(new DefaultRecordSeparatorPolicy());

Refer DefaultRecordSeparatorPolicy API Doc for more information 有关更多信息,请参考DefaultRecordSeparatorPolicy API文档

@Bean
public FlatFileItemReader<YourClassName> itemReader(@Value("${input}") Resource resource) {
    FlatFileItemReader<YourClassName> flatFileItemReader = new FlatFileItemReader<>();
    flatFileItemReader.setResource(resource);
    flatFileItemReader.setName("CSV-Reader");
    flatFileItemReader.setLinesToSkip(1);
    // override default comment '#' from file parsing
    flatFileItemReader.setComments(new String[] {});
    // checks for multi-line csv inputs for very lage row
    flatFileItemReader.setRecordSeparatorPolicy(new DefaultRecordSeparatorPolicy());
    flatFileItemReader.setLineMapper(lineMapper());
    return flatFileItemReader;
}

@Bean
public LineMapper<YourClassName> lineMapper() {
    DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
    lineTokenizer.setDelimiter(DelimitedLineTokenizer.DELIMITER_COMMA);
    lineTokenizer.setQuoteCharacter(DelimitedLineTokenizer.DEFAULT_QUOTE_CHARACTER);
    lineTokenizer.setStrict(false);
    lineTokenizer.setNames(COLUMN_NAMES);

    BeanWrapperFieldSetMapper<YourClassName> fieldSetMapper = new BeanWrapperFieldSetMapper<>();
    fieldSetMapper.setTargetType(YourClassName.class);

    DefaultLineMapper<YourClassName> defaultLineMapper = new DefaultLineMapper<>();
    defaultLineMapper.setLineTokenizer(lineTokenizer);
    defaultLineMapper.setFieldSetMapper(fieldSetMapper);
    return defaultLineMapper;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM