简体   繁体   English

正则表达式以匹配CSV文件的嵌套引号

[英]Regex to match CSV file nested quotes

I know this has been discussed a million times. 我知道已经讨论了一百万次了。 I tried searching through the forums and have seen some close regex expressions and tried to modify them but to no avail. 我尝试通过论坛进行搜索,并看到一些接近的正则表达式,并试图对其进行修改,但无济于事。

Say there is a line in a CSV file like this: 假设CSV文件中有这样一行:

"123", 456, "701 "B" Street", 910
                 ^^^

Is there an easy regex to detect "B" (since it's a non-escaped set of quotes within the normal CSV quotes) and replace it with something like \\"B\\" ? 是否有简单的正则表达式来检测"B" (因为它是普通CSV引号中的非转义引号集)并将其替换为\\"B\\"类的东西? The final string would end up looking like this: 最后的字符串看起来像这样:

"123", 456, "701 \"B\" Street", 910

Help would be greatly appreciated! 帮助将不胜感激!

Trust me you don't want to do this with regex. 相信我,您不想使用正则表达式执行此操作。 You want something like Java CSV Library . 您需要Java CSV Library之类的东西。

(?<!^)(?<!",)(?<!\d,)"(?!,")(?!,\d)(?!$)(?!,-\d)

我使它起作用,以为如果有人在寻找答案,我会发布它

There are a few zillion libraries to help you parse CSV, but if you're wanting to use a regexp for academic reasons, this may help: 有数不胜数的库可以帮助您解析CSV,但是如果出于学术原因想要使用正则表达式,则可能会有所帮助:

  • quoted string with escape support. 带引号的带有转义支持的字符串。 "(\\\\.|[^\\\\"])*" “(\\\\。|| [^ \\\\”])*“
  • unquoted field: [^",]* 无引号的字段:[^“,] *
  • delimiter: , * 分隔符:,*

I don't use CSV files, so I'm not sure about the 'other csv field' validity (matching 456, for example above), or whether /, */ is the delimiter you want.. 我不使用CSV文件,因此不确定“其他csv字段”的有效性(例如,上面匹配456),或者不确定/,* /是否是您想要的分隔符。

At any rate, combining the above will match one field and one delimiter (or end of string): 无论如何,结合以上内容将匹配一个字段和一个定界符(或字符串的结尾):

(quotedstring|unquoted)(delimiter|$)

我将使用量身定制的sed表达式作为

's/\(.*\),\(.*\),\(.*\)"\(.*\)\" \(.*\),\(.*\)/\1,\2,\3 \4 \5 \6/g'

Your example is not proper CSV: 您的示例不正确的CSV:

"123", 456, "701 "B" Street", 910

this should actually be: 这实际上应该是:

"123", 456, "701 ""B"" Street", 910

(There are plenty of variations of CSV, of course, but since most of the time people want it for use with excel or access I stick to the Microsoft definition.) (当然,CSV有很多变体,但是由于大多数时间人们都希望将其用于excel或访问权限,因此我坚持使用Microsoft的定义。)

Therefore the regex for this can look like: 因此,正则表达式如下所示:

".+("").+("").+"

The groups (in parentheses) will be your double quotes, and the rest ensures that they are found within another set of quotes. 这些组(用括号括起来)将是您的双引号,其余的将确保在另一组引号中找到它们。

That covers the find part of your needs. 这涵盖了您需要的大部分。 The replace part depends on what you are programming in. 替换部分取决于您正在编程的内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM