简体   繁体   English

正则表达式以捕获多行报价

[英]Regular Expression to Catch multi-line quote

...,"My quote goes on
to multiple lines
like this",...

How would I catch this in a regular expression? 我怎么会在正则表达式中捕获这个? I want to do this in a substitution to end up with 我想以此替代以结束

....,"My quote goes on to multiple lines like this",...

I tried 我试过了

"(?<!\")\r\n(?!\")"

This was in an attempt to find a newline that does NOT end with a quote, and the next line does not start with a quote either. 这是为了找到一个不以引号结尾的换行符,而下一行也不以引号开头。

The following substitution was done in R using that regular expression with no luck... 使用正则表达式在R中完成以下替换,没有运气...

newDF = gsub( "(?<!\")\r\n(?!\")", " ", newDF, perl = TRUE)

You can match a quoted substring and then use gsubfn to replace linebreaks inside the quoted substrings only: 您可以匹配带引号的子字符串,然后使用gsubfn替换引用的子字符串中的换行符:

library(gsubfn)
s = "...,\"My quote goes on\r\nto multiple lines\r\nlike this\",..."
gsubfn("\"[^\"]+\"", function(x) gsub("(?:\r?\n)+", " ", x), s)
[1] "...,\"My quote goes on to multiple lines like this\",..."

The "[^"]+" pattern matches all quoted substrings, and then (?:\\r?\\n)+ matches 1 or more sequences of an optional CR ( \\r? ) followed with 1 LF (that are replaced with a space). "[^"]+"模式匹配所有引用的子串,然后(?:\\r?\\n)+匹配可选CR( \\r? )的一个或多个序列,后跟1个LF(用一个替换为空间)。

Alternatively, you can achieve a similar result with a PCRE regex like 或者,您可以使用PCRE正则表达式获得类似的结果

gsub("(?:\r?\n)+(?!(?:[^\"]|\"[^\"]*\")*$)", " ", s, perl=T)
[1] "...,\"My quote goes on to multiple lines like this\",..."

See the regex demo . 请参阅正则表达式演示 The (?!(?:[^\\"]|\\"[^\\"]*\\")*$) lookahead makes sure there are no even quotes up to the string end. (?!(?:[^\\"]|\\"[^\\"]*\\")*$)前瞻确保字符串末尾没有引号。

> x <- "My quote goes on
+ to multiple lines
+ like this"

> gsub("\\n", " ", x)
[1] "My quote goes on to multiple lines like this"

Don't forget to double the backslashes. 不要忘记加倍反斜杠。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM