简体   繁体   English

如何使用没有DEFAULT_QUOTE_CHARACTER的openCSV CSVReader?

[英]How to have openCSV CSVReader without a DEFAULT_QUOTE_CHARACTER?

I am using CSVReader to read from a tab delimited text file which has a field called "user_comments". 我正在使用CSVReader从制表符分隔的文本文件中读取,该文件具有名为“ user_comments”的字段。 In this column we can find all kinds of free form text which users have entered. 在此列中,我们可以找到用户输入的各种自由格式文本。

Here is the code where I declare my parser... 这是我声明解析器的代码...

import au.com.bytecode.opencsv.CSVReader;

CSVReader csv = new CSVReader(new FileReader(opt.f),'\t' as char, '~' as char, '\0' as char);

The third argument to the constructor there is the "DEFAULT_QUOTE_CHARACTER". 构造函数的第三个参数是“ DEFAULT_QUOTE_CHARACTER”。 The default value is... 默认值为...

 public static final char DEFAULT_QUOTE_CHARACTER = '\"';

I set it to '~' because that "user_comments" column has values with double quotes inside of it (which should not be treated as actual quotes but should just be read as data from the column). 我将其设置为“〜”,因为“ user_comments”列中的值带有双引号(不应将其视为实际引号,而应仅将其作为列中的数据读取)。

Problem is that column also has "~" and "|". 问题在于该列还具有“〜”和“ |”。

So can I create an instance of CSVReader without a default quote character? 那么我可以创建没有默认引号字符的CSVReader实例吗? If not can you suggest a character I can use which is very rare and likely not found in this "user_comments" column? 如果不能,那么您可以建议一个我可以使用的字符,这种字符非常罕见,并且很可能在此“ user_comments”列中找不到?

Inspect Unicode's BMP plane ( http://unicode.org/roadmaps/bmp/ ) back to front. 从头到尾检查Unicode的BMP平面( http://unicode.org/roadmaps/bmp/ )。 You're bound to find one that is "unlikely to be used in your data". 您一定会找到一个“不太可能在您的数据中使用”的数据。 Then use \\u.... to code it in your pgm source. 然后使用\\ u ....在pgm源代码中对其进行编码。

Or better still, use a codepoint that doesn't even represent a Unicode char, eg \퟇. 或更妙的是,使用甚至不表示Unicode字符的代码点,例如\\ ud7c7。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM