[英]Fix regex expression used to replace all \n and \r inside quotes
This might be hard to explain, I will do my best.这可能很难解释,我会尽力而为。 I am currently working on a csv transform stream parser in nodejs, but I am struggling in replacing all \n's and \r's inside quotes (") that wrap a value.
我目前正在使用 nodejs 中的 csv 转换 stream 解析器,但我正在努力替换所有包含值的 \n 和 \r 内引号 (")。
At the moment I have the following regex:目前我有以下正则表达式:
(^|[;])"(?:""|[^"])*[\n\r]+(?:""|[^"])*"
Where;在哪里; is the column delimiter.
是列分隔符。
And here is two examples, the first one where its doing what is expected and the second one where its capturing but it shouldn't because the;这里有两个例子,第一个例子是它在做预期的事情,第二个例子是它捕获但它不应该因为; is inside quotes.
在引号内。
First Test (success)第一次测试(成功)
test;"123";"this description with new line feed below should be
matched by regex";test;"1.0"
Second Test (error)第二次测试(错误)
NewLine1;"test - this one should not be captured by the regex but its being captured ";test;1
NewLine2;"test that went wrong"
Is there a way to pick the text that is between quotes, containing semicolon before first quote and containing semicolon after last quote, but ignore semicolon inside quotes?有没有办法选择引号之间的文本,在第一个引号之前包含分号,在最后一个引号之后包含分号,但忽略引号内的分号? I think that's what I need, so the second example is not take into account for the regex match.
我认为这就是我需要的,所以第二个例子没有考虑到正则表达式匹配。
Thank you in advance.先感谢您。
You may use:您可以使用:
(^|;)"(?:""|[^";])*[\n\r]+(?:""|[^";])*"
I changed [;]
to ;
我将
[;]
更改为;
because they're equivalent in your case.因为它们在您的情况下是等效的。 Also added
;
还添加了
;
character to [^";]
because your CSV stream column value, can't have this character.字符到
[^";]
因为你的 CSV stream 列值,不能有这个字符。
I don't know why you have ""
in the regex but if you seek considering other double quotes in the column value, i assume they must be escaped by \
and so you can use regex like (^|;)"(?:(?<=\\)"|[^";])*[\n\r]+(?:(?<=\\)"|[^";])*"
that has (?<=\\)"
instead of ""
which indicates "
character preceding with back slashes.我不知道为什么你在正则表达式中有
""
但是如果你想在列值中考虑其他双引号,我认为它们必须被\
转义,所以你可以使用像(^|;)"(?:(?<=\\)"|[^";])*[\n\r]+(?:(?<=\\)"|[^";])*"
具有(?<=\\)"
而不是""
,后者表示"
字符前面带有反斜杠。 ( \"
) (
\"
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.