简体   繁体   English

修复用于替换所有 \n 和 \r 引号内的正则表达式

[英]Fix regex expression used to replace all \n and \r inside quotes

This might be hard to explain, I will do my best.这可能很难解释,我会尽力而为。 I am currently working on a csv transform stream parser in nodejs, but I am struggling in replacing all \n's and \r's inside quotes (") that wrap a value.我目前正在使用 nodejs 中的 csv 转换 stream 解析器,但我正在努力替换所有包含值的 \n 和 \r 内引号 (")。

At the moment I have the following regex:目前我有以下正则表达式:

(^|[;])"(?:""|[^"])*[\n\r]+(?:""|[^"])*"

Where;在哪里; is the column delimiter.是列分隔符。

And here is two examples, the first one where its doing what is expected and the second one where its capturing but it shouldn't because the;这里有两个例子,第一个例子是它在做预期的事情,第二个例子是它捕获但它不应该因为; is inside quotes.在引号内。

First Test (success)第一次测试(成功)

test;"123";"this description with new line feed  below should be
matched by regex";test;"1.0"
 

Second Test (error)第二次测试(错误)

NewLine1;"test - this one should not be captured by the regex but its being captured ";test;1
NewLine2;"test that went wrong"

Is there a way to pick the text that is between quotes, containing semicolon before first quote and containing semicolon after last quote, but ignore semicolon inside quotes?有没有办法选择引号之间的文本,在第一个引号之前包含分号,在最后一个引号之后包含分号,但忽略引号内的分号? I think that's what I need, so the second example is not take into account for the regex match.我认为这就是我需要的,所以第二个例子没有考虑到正则表达式匹配。

Thank you in advance.先感谢您。

You may use:您可以使用:

(^|;)"(?:""|[^";])*[\n\r]+(?:""|[^";])*"

Regex Demo正则表达式演示

I changed [;] to ;我将[;]更改为; because they're equivalent in your case.因为它们在您的情况下是等效的。 Also added ;还添加了; character to [^";] because your CSV stream column value, can't have this character.字符到[^";]因为你的 CSV stream 列值,不能有这个字符。

I don't know why you have "" in the regex but if you seek considering other double quotes in the column value, i assume they must be escaped by \ and so you can use regex like (^|;)"(?:(?<=\\)"|[^";])*[\n\r]+(?:(?<=\\)"|[^";])*" that has (?<=\\)" instead of "" which indicates " character preceding with back slashes.我不知道为什么你在正则表达式中有""但是如果你想在列值中考虑其他双引号,我认为它们必须被\转义,所以你可以使用像(^|;)"(?:(?<=\\)"|[^";])*[\n\r]+(?:(?<=\\)"|[^";])*"具有(?<=\\)"而不是"" ,后者表示"字符前面带有反斜杠。 ( \" ) ( \" )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM