简体   繁体   English

RegEx-Java Split Command解析Csv文件

[英]RegEx - Java Split Command Parsing Csv File

I have a CSV in this below format 我有以下格式的CSV文件

11000,Christopher,Nolan,MR.,Inception,25993,France,"Lefoullon,Paris",920,Director,*461-7755,33-461-7755,12175,"O'Horner, James",12300,"Glebova, Nathalie",,Christophe.Nolan@movies.com,Capital,NEW

Regarding Java Split Command Parsing Csv File 关于Java Split命令解析Csv文件

In this link @Mark Byers and @R. 在此链接中,@ Mark Byers和@R。 Bemrose suggested String[] tokens = line.split(",(?=([^\\"]*\\"[^\\"]*\\")*[^\\"]*$)", -1); But if you notice carefully in the above CSV, you will find a name with "O'Horner, James" is causing problems and its throwing ORA-0917: missing comma error. Is there a way to avoid it or the reg-ex has to be corrected? Bemrose建议使用String[] tokens = line.split(",(?=([^\\"]*\\"[^\\"]*\\")*[^\\"]*$)", -1);但是如果您在上面的CSV文件中仔细注意到,您会发现名称"O'Horner, James"正在引起问题,并抛出ORA-0917: missing comma错误。有没有办法避免这种情况,或者正则表达式必须被纠正?

Kinda confused :-o 有点困惑:-o

Caveat: all of the following is idle speculation and guesswork, as you haven't supplied any code for verification, and my palantir is in the workshop for preventative maintenance. 注意:由于您没有提供任何代码来进行验证,因此以下所有内容都是闲置的推测和猜测,而我的palantir则在车间进行预防性维护。

Train of thought: You don't get a problem with the earlier "Lefoullon,Paris" but you do get a problem with "O'Horner, James" ... this suggests that the apostrophe is probably the (innocent) cause of the problem. 思路:您对早期的"Lefoullon,Paris"没有问题,但是对"O'Horner, James"问题……这表明撇号可能是导致“无辜”的原因。问题。

Hypothesis: The field is successfully extracted from the CSV as O'Horner, James ... note that apostrophe is NOT special to CSV (and doesn't occur in that magnificent [see note] regex). 假设:以O'Horner, James身份成功从CSV提取字段。请注意,撇号对CSV而言并不特殊(在宏伟的正则表达式中不会出现)。

However the apostrophe is significant to SQL; 但是,撇号对SQL很重要。 apostrophes quote string literals in SQL, and apostrophes in the data must be doubled. 撇号在SQL中引用字符串文字,并且数据中的撇号必须加倍。

Like this: INSERT INTO ..... VALUES(...,'O''Horner, James', ...); 像这样: INSERT INTO ..... VALUES(...,'O''Horner, James', ...);

If you are using parameter substitution in your SQL interface (as you should be), converting your data fields into valid SQL constants will be done for you. 如果您正在SQL接口中使用参数替换(应该如此),则将为您完成将数据字段转换为有效的SQL常量的操作。 Otherwise 除此以外

  • write code to fix each string field (replace every occurrence of ' by '' then wrap the result in ' front and back) 编写代码以修复每个字符串字段(用''替换每次出现的' ,然后将结果包装在'前后”)

  • google("SQL injection"), read, repent, and rewrite your code using parameter substitution google(“ SQL注入”),使用参数替换读取,re悔和重写您的代码


Note: "magnificent" as in "C'est magnifique, mais ce n'est pas la guerre". 注意:如“ C'est magnifique,mais ce n'est pas la guerre”中的“ magnificent”。 Use a CSV parser, for sanity's sake. 为了理智,请使用CSV解析器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM