简体   繁体   English

如何使用反斜杠符号解析或清理传入的 csv 条目?

[英]How to parse or sanitize incoming csv entries with backslash symbols?

I have a requirement to parse external CSV files and read their name attributes.我需要解析外部 CSV 文件并读取它们的名称属性。 I am using opencsv library to achieve this, please find the test code below.我正在使用 opencsv 库来实现这一点,请在下面找到测试代码。 It works pretty well with valid CSV files, however, if one of the rows is invalid, there is no way to handle that error.它适用于有效的 CSV 文件,但是,如果其中一行无效,则无法处理该错误。 I shared an example CSV below with an error case, inside which the escaped double quote is causing the problem in java. Could we somehow parse this inline or at the file level and replace \" with " .我在下面分享了一个错误案例 CSV 的示例,其中转义的双引号导致 java 中的问题。我们能否以某种方式解析此内联或在文件级别并将\"替换为"

    @Test
    public void csvTest() throws IOException {
        String fileName = "ERROR.csv";
        File file = new File("D:\\csvFiles\\" + fileName);
        if (file.exists()) {

            CSVReader csvReader = new CSVReader(new FileReader("D:\\csvFiles\\" + fileName));
            String[] nextLine;
            int row = 0;
            while ((nextLine = csvReader.readNext()) != null) {
                row++;
                if (nextLine.length > 0) {
                    System.out.println("ROW: " + row + " " + String.join(",", nextLine));
                }
            }

        }
    } 

ERROR.csv错误.csv

id,name,address,phone
"1","Bob","New Jersey","9999999999"
"2","Smith","Sydney ///\","9999999999"

Note: When we open this csv file in the excel app, then it renders perfectly, so is it only in the java world that is treating it erroneously, because a double quote has been escaped with the preceding backslash ( \" )?注意:当我们在 excel 应用程序中打开这个 csv 文件时,它会完美呈现,所以它是否仅在 java 世界中错误地处理它,因为双引号已被前面的反斜杠 ( \" ) 转义?

在此处输入图像描述

A customized CSVReader instance works for me;定制CSVReader实例适合我; see code below:看下面的代码:

CSVParserBuilder pb = new CSVParserBuilder();
CSVParser p = pb.withIgnoreLeadingWhiteSpace(true)
        .withEscapeChar('%')
        .withSeparator(',')
        .build();
CSVReaderBuilder rb = new CSVReaderBuilder(new FileReader(file));
rb.withCSVParser(p);
CSVReader csvReader = rb.build();

String[] nextLine;
int row = 0;
while ((nextLine = csvReader.readNext()) != null) {
  row++;
  if (nextLine.length > 0) {
    System.out.println("ROW: " + row + " " + String.join(",", nextLine));
  }
}

Note: I set a different escape character with .withEscapeChar('%') .注意:我使用.withEscapeChar('%')设置了不同的转义字符。 You could choose any special character different from \ of which you know that it has no actual meaning in your data.您可以选择不同于\的任何特殊字符,您知道它在您的数据中没有实际意义。

Given such a customized CSVParser , the configured CSVReader instance works just fine with your csv data provided in the OP.给定这样一个自定义的CSVParser ,配置的CSVReader实例可以很好地处理 OP 中提供的 csv 数据。

It produces它产生

ROW: 1 id,name,address,phone
ROW: 2 1,Bob,New Jersey,9999999999
ROW: 3 2,Smith,Sydney ///\,9999999999

as (expected) output without any errors.如(预期)output 没有任何错误。

I used OpenCSV in version 5.7.x我在 5.7.x 版本中使用了 OpenCSV

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM