简体   繁体   English

使用 OpenCSV 解析 CSV,在引用字段中使用双引号

[英]Parse CSV with OpenCSV with double quotes inside a quoted field

I am trying to parse a CSV file using OpenCSV.我正在尝试使用 OpenCSV 解析 CSV 文件。 One of the columns stores the data in YAML serialized format and is quoted because it can have comma inside it.其中一列以 YAML 序列化格式存储数据并被引用,因为它可以在其中包含逗号。 It also has quotes inside it, so it is escaped by putting two quotes.它里面也有引号,所以它可以通过放置两个引号来转义。 I am able to parse this file easily in Ruby, but with OpenCSV I am not able to parse it fully.我可以在 Ruby 中轻松解析此文件,但使用 OpenCSV 我无法完全解析它。 It is a UTF-8 encoded file.它是一个 UTF-8 编码的文件。

Here is my Java snippet which is trying to read the file这是我试图读取文件的 Java 片段

CSVReader reader = new CSVReader(new InputStreamReader(new FileInputStream(csvFilePath), "UTF-8"), ',', '\"', '\\');

Here are 2 lines from this file.这是该文件中的 2 行。 First line is not being parsed properly and is getting split at ""[Fair Trade Certified]"" because of escaped double quotes I guess.第一行没有被正确解析,并且因为我猜是转义的双引号而在""[Fair Trade Certified]""处被拆分。

1061658767,update,1196916,Product,28613099,Product::Source,"---
product_attributes:
-
- :name: Ornaments
  :brand_id: 49120
  :size: each
  :alcoholic: false
  :details: ""[Fair Trade Certified]""
  :gluten_free: false
  :kosher: false
  :low_fat: false
  :organic: false
  :sugar_free: false
  :fat_free: false
  :vegan: false
  :vegetarian: false
",,2015-11-01 00:06:19.796944,,,,,,
1061658768,create,,,28613100,Product::Source,"---
product_id:
retailer_id:
store_id:
source_id: 333790
locale: en_us
source_type: Product::PrehistoricProductDatum
priority: 1
is_definition:
product_attributes:
",,2015-11-01 00:06:19.927948,,,,,,

The solution was to use a RFC4180 compatible CSV parser, as suggested by Paul .解决方案是使用 RFC4180 兼容的 CSV 解析器,正如Paul所建议的那样。 I had used CSVReader from OpenCSV which didn't work or maybe I couldn't get it to work properly.我使用了 OpenCSV 中的 CSVReader,但它无法正常工作,或者我无法使其正常工作。

I used FastCSV , a RFC4180 CSV parser, and it worked seamlessly.我使用了FastCSV ,一个 RFC4180 CSV 解析器,它无缝地工作。

File file = new File(csvFilePath);
CsvReader csvReader = new CsvReader();
CsvContainer csv = csvReader.read(file, StandardCharsets.UTF_8);
for (CsvRow row : csv.getRows()) {
    System.out.println(row.getFieldCount());  
}

First off I am glad the FastCSV worked for you but I ran the suspected substring and ran it through the 3.9 openCSV and it worked with both the CsvParser and the RFC4180Parser.首先,我很高兴 FastCSV 为您工作,但我运行了可疑的子字符串并通过 3.9 openCSV 运行它,并且它与 CsvParser 和 RFC4180Parser 一起使用。 Could you please give a little detail on how it did not parse and/or try it with 3.9 openCSV to see if you get the same issue and then try with the configuration below.您能否详细说明它如何不解析和/或使用 3.9 openCSV 尝试它,以查看您是否遇到相同的问题,然后尝试使用以下配置。

Here are the tests that I used:以下是我使用的测试:

CSVParser: CSV解析器:

@Test
public void parseBigStringFromStackOverflowWithMultipleQuotesInLine() throws IOException {

    String bigline = "28613099,Product::Source,\"---\n" +
            "product_attributes:\n" +
            "-\n" +
            "- :name: Ornaments\n" +
            "  :brand_id: 49120\n" +
            "  :size: each\n" +
            "  :alcoholic: false\n" +
            "  :details: \"\"[Fair Trade Certified]\"\"\n" +
            "  :gluten_free: false\n" +
            "  :kosher: false\n" +
            "  :low_fat: false\n" +
            "  :organic: false\n" +
            "  :sugar_free: false\n" +
            "  :fat_free: false\n" +
            "  :vegan: false\n" +
            "  :vegetarian: false\n" +
            "\",,2015-11-01 00:06:19.796944";

    String suspectString = "---\n" +
            "product_attributes:\n" +
            "-\n" +
            "- :name: Ornaments\n" +
            "  :brand_id: 49120\n" +
            "  :size: each\n" +
            "  :alcoholic: false\n" +
            "  :details: \"[Fair Trade Certified]\"\n" +
            "  :gluten_free: false\n" +
            "  :kosher: false\n" +
            "  :low_fat: false\n" +
            "  :organic: false\n" +
            "  :sugar_free: false\n" +
            "  :fat_free: false\n" +
            "  :vegan: false\n" +
            "  :vegetarian: false\n" ;

    StringReader stringReader = new StringReader(bigline);

    CSVReaderBuilder builder = new CSVReaderBuilder(stringReader);
    CSVReader csvReader = builder.withFieldAsNull(CSVReaderNullFieldIndicator.BOTH).build();

    String item[] = csvReader.readNext();

    assertEquals(5, item.length);
    assertEquals("28613099", item[0]);
    assertEquals("Product::Source", item[1]);
    assertEquals(suspectString, item[2]);
}

RFC4180Parser RFC4180解析器

def 'parse big line from stackoverflow with complex string'() {
    given:
    RFC4180ParserBuilder builder = new RFC4180ParserBuilder()
    RFC4180Parser parser = builder.build()
    String bigline = "28613099,Product::Source,\"---\n" +
            "product_attributes:\n" +
            "-\n" +
            "- :name: Ornaments\n" +
            "  :brand_id: 49120\n" +
            "  :size: each\n" +
            "  :alcoholic: false\n" +
            "  :details: \"\"[Fair Trade Certified]\"\"\n" +
            "  :gluten_free: false\n" +
            "  :kosher: false\n" +
            "  :low_fat: false\n" +
            "  :organic: false\n" +
            "  :sugar_free: false\n" +
            "  :fat_free: false\n" +
            "  :vegan: false\n" +
            "  :vegetarian: false\n" +
            "\",,2015-11-01 00:06:19.796944"

    String suspectString = "---\n" +
            "product_attributes:\n" +
            "-\n" +
            "- :name: Ornaments\n" +
            "  :brand_id: 49120\n" +
            "  :size: each\n" +
            "  :alcoholic: false\n" +
            "  :details: \"[Fair Trade Certified]\"\n" +
            "  :gluten_free: false\n" +
            "  :kosher: false\n" +
            "  :low_fat: false\n" +
            "  :organic: false\n" +
            "  :sugar_free: false\n" +
            "  :fat_free: false\n" +
            "  :vegan: false\n" +
            "  :vegetarian: false\n"

    when:
    String[] values = parser.parseLine(bigline)

    then:
    values.length == 5
    values[0] == "28613099"
    values[1] == "Product::Source"
    values[2] == suspectString
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM