简体   繁体   English

OpenCSV CsvToBean:不带BOM的UTF-8无法读取第一列

[英]OpenCSV CsvToBean: First column not read for UTF-8 Without BOM

Using OpenCSV to parse UTF-8 documents without BOM results in the first column not read. 使用OpenCSV解析没有 BOM的UTF-8文档会导致第一列无法读取。 Giving as an input the same document content but encoded in UTF-8 with BOM works correctly. 输入相同的文档内容,但使用 BOM表以UTF-8 进行编码,则可以正常工作。

I set specifically the charset to UTF-8 我将字符集专门设置为UTF-8

    fileInputStream = new FileInputStream(file);
    inputStreamReader = new InputStreamReader(fileInputStream, StandardCharsets.UTF_8);
    reader = new BufferedReader(inputStreamReader);
    HeaderColumnNameMappingStrategy<Bean> ms = new HeaderColumnNameMappingStrategy<Bean>();
    ms.setType(Bean.class);
    CsvToBean<Bean> csvToBean = new CsvToBeanBuilder<Bean>(reader).withType(Bean.class).withMappingStrategy(ms)
            .withSeparator(';').build();
    csvToBean.parse();

I've created a sample project where the issue can be reproduced: https://github.com/dajoropo/csv2beanSample 我创建了一个示例项目,可以在其中复制该问题: https : //github.com/dajoropo/csv2beanSample

Running the Unit Test you can see how the UTF-8 file without BOM fails and with BOM works correctly. 运行单元测试,您可以看到没有BOM的UTF-8文件如何失败以及带有BOM的UTF-8文件如何正常工作。

The error comes in the second assertion, because the first column in not read. 错误出现在第二个断言中,因为第一列未读。 Result it: 结果:

[Bean [a= null , b=second, c=third]] [Bean [a = null ,b = second,c = third]]

Any hint? 有什么提示吗?

If I open Bean class in you project and search for "B" then I can find one entry. 如果我在您的项目中打开Bean类并搜索“ B”,那么我可以找到一个条目。 If I search for "A" then I cannot :) It means you copy/pasted A with BOM header to Bean class. 如果我搜索“ A”,那么我不能:)这意味着您将带有BOM表头的A复制/粘贴到Bean类。 BOM header is not visible but still taken into account. BOM表头不可见,但仍会考虑在内。

If I fix "A" then another test starts failing but I think you can fix it using BOMInputStream . 如果我修复“ A”,则另一个测试开始失败,但是我认为您可以使用BOMInputStream对其进行BOMInputStream

Check this question and answer Byte order mark screws up file reading in Java 检查此问题并回答字节顺序标记会破坏Java中的文件读取

It is known problem. 这是已知问题。 You can use Apache Commons IO's BOMInputStream to solve it. 您可以使用Apache Commons IO的BOMInputStream来解决它。

Just tried 刚试过

    <dependency>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
        <version>2.6</version>
    </dependency>

and

        inputStreamReader = new InputStreamReader(new BOMInputStream(fileInputStream), StandardCharsets.UTF_8);

and fixing 和修复

@CsvBindByName(column = "A")
private String a;

to exclude prefix from "A" makes both tests passing 从“ A”中排除前缀会使两个测试均通过

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM