简体   繁体   English

使用REGEX在多个带引号的字符串中匹配和替换

[英]Match & replace within multiple quoted strings with REGEX

I want to replace all spaces within quotes with underscores in R. I'm not sure how to define the quoted strings correctly when there are more than one. 我想用R中的下划线替换引号中的所有空格。我不确定如何在有多个时正确定义引用的字符串。 My starting effort fails, and I haven't even got on to single/double quotes. 我的开始努力失败了,我甚至没有接受单/双引号。

require(stringi)
s = "The 'quick brown' fox 'jumps over' the lazy dog"
stri_replace_all(s, regex="('.*) (.*')", '$1_$2')
#> [1] "The 'quick brown' fox 'jumps_over' the lazy dog"

Grateful for help. 感谢帮助。

Let's assume you need to match all non-overlapping substrings that start with ' , then have 1 or more chars other than ' and then end with ' . 假设您需要匹配以'开头的所有非重叠子串,然后具有除'之外' 1个或多个字符,然后以'结尾' The pattern is '[^']+' . 模式是'[^']+'

Then, you may use the following base R code: 然后,您可以使用以下基本R代码:

x = "The 'quick cunning brown' fox 'jumps up and over' the lazy dog"
gr <- gregexpr("'[^']+'", x)
mat <- regmatches(x, gr)
regmatches(x, gr) <- lapply(mat, gsub, pattern="\\s", replacement="_")
x
## => [1] "The 'quick_cunning_brown' fox 'jumps_up_and_over' the lazy dog"

See this R demo . 这个R演示 Or, use gsubfn : 或者,使用gsubfn

> library(gsubfn)
> rx <- "'[^']+'"
> s = "The 'quick cunning brown' fox 'jumps up and over' the lazy dog"
> gsubfn(rx, ~ gsub("\\s", "_", x), s)
[1] "The 'quick_cunning_brown' fox 'jumps_up_and_over' the lazy dog"
> 

To support escape sequences, you may use a much more complex PCRE regex: 要支持转义序列,您可以使用更复杂的PCRE正则表达式:

(?<!\\)(?:\\{2})*\K'[^'\\]*(?:\\.[^'\\]*)*'

Details : 细节

  • (?<!\\\\) - no \\ immediately before the current location (?<!\\\\) - 当前位置之前没有\\
  • (?:\\\\{2})* - zero or more sequences of 2 \\ s (?:\\\\{2})* - 零个或多个2 \\ s的序列
  • \\K - match reset operator \\K 匹配重置运算符
  • ' - a single quote ' - 单引号
  • [^'\\\\]* - zero or more chars other than ' and \\ [^'\\\\]* - 除了'\\之外'零个或多个字符
  • (?:\\\\.[^'\\\\]*)* - zero or more sequences of: (?:\\\\.[^'\\\\]*)* - 零个或多个序列:
    • \\\\. - a \\ followed with any char but a newline - 一个\\后跟任何字符,但换行符
    • [^'\\\\]* - zero or more chars other than ' and \\ [^'\\\\]* - 除了'\\之外'零个或多个字符
  • ' - a single quote. ' - 单引号。

And the R demo would look like R演示看起来像

x = "The \\' \\\\\\' \\\\\\\\'quick \\'cunning\\' brown' fox 'jumps up \\'and\\' over' the lazy dog"
cat(x, sep="\n")
gr <- gregexpr("(?<!\\\\)(?:\\\\{2})*\\K'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", x, perl=TRUE)
mat <- regmatches(x, gr)
regmatches(x, gr) <- lapply(mat, gsub, pattern="\\s", replacement="_")
cat(x, sep="\n")

Output: 输出:

The \' \\\' \\\\'quick \'cunning\' brown' fox 'jumps up \'and\' over' the lazy dog
The \' \\\' \\\\'quick_\'cunning\'_brown' fox 'jumps_up_\'and\'_over' the lazy dog

Try this: 试试这个:

require(stringi)
s = "The 'quick brown' fox 'jumps over' the lazy dog"
stri_replace_all(s, regex="('[a-z]+) ([a-z]+')", '$1_$2')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM