简体   繁体   English

JAVA中的CSV解析器,字符串中的双引号(SuperCSV,OpenCSV)

[英]CSV parser in JAVA, double quotes in string (SuperCSV, OpenCSV)

all day I've been searching how to resolve this probem and nothing... I want to write function, which convert CSV file to collection of lists (of strings). 我整天都在寻找如何解决这个问题的方法,但没有什么...我想编写函数,将CSV文件转换为(字符串)列表的集合。 Here is this function: 这是这个功能:

public Collection<? extends List<String>> parse() throws IOException {
    Collection<List<String>> collectionOfLists = new ArrayList<List<String>>();
    CsvListReader parser = new CsvListReader(Files.newBufferedReader(pathToFile, StandardCharsets.UTF_8), CsvPreference.EXCEL_PREFERENCE);

    List<String> row;
    while( (row = parser.read()) != null)
        collectionOfLists.add(row);

    return collectionOfLists;
}

public static String toString(Collection<? extends List<String>> csv) {
    StringBuilder builder = new StringBuilder();
    for(List<String> l : csv) {
        for(String s : l)
            builder.append(s).append(',');
        if(builder.length() > 0)
            builder.setCharAt(builder.length()-1,'\n');
    }
    return builder.toString();
}

But eg for that input: 但是例如对于该输入:

id, name, city, age
1,"Bob",London,12

Output for toString(parse()) is: toString(parse())的输出是:

id, name, city, age
1,Bob,London,12 

instead of the same like input:/ What can I do, that strings contain \\" (quotes) ? Please help me. 而不是相同的输入:/我该怎么办,这些字符串包含\\“(引号)?请帮助我。

It's not clear from your question whether you're asking.... 从您的问题中不清楚您是否在问...。

1. My data contains quotes - why are they being stripped out? 1.我的数据包含引号-为什么要删除引号?

In this case, I'd point you to the CSV specification as your CSV file is not properly escaped, so those quotes aren't actually part of your data. 在这种情况下,我将向您指出CSV规范,因为您的CSV文件未正确转义,因此这些引号实际上并不是数据的一部分。

It should be 它应该是

1,""Bob"",London,12

not

1,"Bob",London,12

2. How do I apply quotes when writing (even if the data doesn't contain commas, quotes, etc)? 2.在编写时如何使用引号(即使数据不包含逗号,引号等)?

By default Super CSV only escapes if necessary (the field contains a comma, double quote or newline). 默认情况下,Super CSV仅在必要时转义(该字段包含逗号,双引号或换行符)。

If you really want to enable quotes, then you can configure Super CSV with a quote mode . 如果您确实要启用引号,则可以将超级CSV配置为引号模式

For example, you could always quote the name column in your example with the following preferences: 例如,您始终可以在示例中使用以下首选项来引用名称列:

private static final CsvPreference ALWAYS_QUOTE_NAME_COL = 
    new CsvPreference.Builder(CsvPreference.STANDARD_PREFERENCE)
    .useQuoteMode(new ColumnQuoteMode(2)).build();

Alternatively, if you want to quote everything then you can use AlwaysQuoteMode , or if you want a completely custom solution, then you can write your own QuoteMode . 另外,如果要引用所有内容,则可以使用AlwaysQuoteMode ,或者如果要完全自定义的解决方案,则可以编写自己的QuoteMode

In the CsvPreference.EXCEL_PREFERENCE you've given, the quote character is the " as described in the javadoc . The quote character is a character you use to wrap special characters that want you want to appear literally. 在给定的CsvPreference.EXCEL_PREFERENCE ,引号字符是javadoc中描述的" 。引号字符是用于包装希望从字面上出现的特殊字符的字符。

As such, for these preferences, the appropriate way to produce your CSV content would be 因此,对于这些首选项,生成CSV内容的适当方法是

id, name, city, age
1,"""Bob""",London,12

Otherwise, the CSV parser simply thinks 否则,CSV解析器只会认为

"Bob"

means, literally, 字面上的意思是

Bob

since there is no other special character between the quotes. 因为引号之间没有其他特殊字符。 But a quote is a special character so if it appears between quotes, it will be considered, literally, as a quote. 但是引号是一个特殊字符,因此,如果在引号之间出现引号,则从字面上将其视为引号。

Alternatively, provide a different CsvPreference object which has a different quote character. 或者,提供一个具有不同引号字符的不同CsvPreference对象。

Make this decision only after you are certain about what your CSV producer is sending you. 仅在确定CSV生产者向您发送的内容之后,才能做出此决定。

You create your own Preference. 您创建自己的首选项。

CsvPreference excelPreference = new CsvPreference.Builder('\'', ',', "\n").build();
CsvListReader parser = new CsvListReader(Files.newBufferedReader(pathToFile , StandardCharsets.UTF_8), excelPreference);

After that, it will output as expected. 之后,它将按预期输出。 In this example, you will strip the single quote if you have that in your csv file and keep the double quote untouched. 在此示例中,如果csv文件中有单引号,则将其去除,并保持双引号不变。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM