简体   繁体   English

如何使用零宽度不间断空格字符解决 CsvToBean header 问题?

[英]How to resolve CsvToBean header issue with zero-width no-break space character?

I have this code snippet that uses OpenCSV:我有这个使用 OpenCSV 的代码片段:

class Pojo {

    @CsvBindByName(column="point")
    Integer point;

    @CsvBindByName(column="name")
    String name;

}

And:和:

class Main {

   readFile(){

     CsvReader reader = new Csv(.....);
    
     CsvToBean<Pojo> bean = new CsvToBeanBuilder<Pojo>(reader)...;
     
     List<Pojo> list = bean.parse();
     
   }

}

Why is it - while parsing - not considering header coming with zwnbsp and that column value I am getting as null ?为什么它 - 在解析时 - 不考虑 header 与zwnbsp以及我得到的列值null

Example input data:输入数据示例:

ZWNBSPpoint ZWNBSP点

Not a real answer, yet a potential workaround for you, in case all CSV files to be processed share the extra zwnbsp character at the very beginning of the input (file).这不是一个真正的答案,而是一个潜在的解决方法,以防所有 CSV 个要处理的文件在输入(文件)的开头共享额外的zwnbsp字符。

In Pojo switch to:Pojo中切换到:

@CsvBindByName(column="\uFEFFpoint")
Integer point;

Given input data给定输入数据

  • var input = "point,name\n1,A";

And

CSVReader csvReader = new CSVReader(new StringReader(input));
List<Pojo> beans = new CsvToBeanBuilder<Pojo>(csvReader)
        .withType(Pojo.class)
        .withIgnoreLeadingWhiteSpace(true)
        .build()
        .parse();
System.out.println(beans);

This will produce a valid Pojo[point = 1, name = A] .这将产生一个有效的Pojo[point = 1, name = A] I tested the above scenario with OpenCSV version 5.7.1, with Java 17 under MacOS.我用 OpenCSV 版本 5.7.1 测试了上述场景,在 MacOS 下使用 Java 17。

Without the above adjustment to the mapping annotation ("point" -> "point"), CSVReader will interpret the extra UTF-8 symbolas an extra character for the mapping to the field point which will produce a mismatch finally resulting in a non-populated field value.如果不对映射注释(“点”->“点”)进行上述调整, CSVReader会将额外的 UTF-8 符号解释映射到字段point的额外字符,这将产生不匹配,最终导致非填充的字段值。

However, it seems to be incorrect/invalid CSV input, as others already pointed out in the comments below your question.但是,它似乎是不正确/无效的 CSV 输入,正如其他人已经在您的问题下方的评论中指出的那样。 OpenCSV does not seem to have a flag or switch for HeaderColumnNameMappingStrategy to circumvent such cases as reported by you. OpenCSV 似乎没有HeaderColumnNameMappingStrategy的标志或开关来规避您报告的此类情况。

Apologies for misleading you and for some strange reason, missing the BOM problem.很抱歉误导您,并且出于某种奇怪的原因,遗漏了 BOM 问题。 This is not extensively tested, but works:这没有经过广泛测试,但有效:

package com.technojeeves.opencsvbeans;

import com.opencsv.bean.CsvToBeanBuilder;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Objects;
import java.util.List;

import java.io.IOException;
import java.io.Reader;
import java.io.FilterReader;

public class App {
    public static void main(String[] args) {
        try {
            System.out.println(new App().read(Path.of(args[0])));
        } catch (Throwable t) {
            t.printStackTrace();
        }
    }

    public List<Pojo> read(Path path) {
        try (Reader reader = new BomFilterReader(Files.newBufferedReader(path))) {
        //try (Reader reader = Files.newBufferedReader(path)) {
            return new CsvToBeanBuilder(reader).withType(Pojo.class).build().parse();
        } catch (IOException e) {
            throw new RuntimeException("Cannot read file: " + path.toFile().getName() + e);
        }
    }

}

class BomFilterReader extends FilterReader {
    public static final char BOM = '\uFEFF';
    private boolean haveReadBOM = false;
    
    public BomFilterReader(Reader in) {
        super(in);
    }

    @Override
    public int read() throws IOException {
        int c = super.read();
        if (!haveReadBOM && ((char)c == BOM)) {
            return super.read();
        }
        
        haveReadBOM = true;
        return c;
    }


    @Override
    public int read(char[] a) throws IOException {
        return read(a, 0, a.length);
    }

    @Override
    public int read(char a[], int off, int len) throws IOException {
        Objects.checkFromIndexSize(off, len, a.length);
        if (len == 0) {
            return 0;
        }

        int c = read();
        if (c == -1) {
            return -1;
        }
        a[off] = (char) c;

        int i = 1;
        try {
            for (; i < len; i++) {
                c = read();
                if (c == -1) {
                    break;
                }
                a[off + i] = (char) c;
            }
        } catch (IOException ee) {
        }
        return i;
    }

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM