简体   繁体   中英

Match & replace within multiple quoted strings with REGEX

I want to replace all spaces within quotes with underscores in R. I'm not sure how to define the quoted strings correctly when there are more than one. My starting effort fails, and I haven't even got on to single/double quotes.

require(stringi)
s = "The 'quick brown' fox 'jumps over' the lazy dog"
stri_replace_all(s, regex="('.*) (.*')", '$1_$2')
#> [1] "The 'quick brown' fox 'jumps_over' the lazy dog"

Grateful for help.

Let's assume you need to match all non-overlapping substrings that start with ' , then have 1 or more chars other than ' and then end with ' . The pattern is '[^']+' .

Then, you may use the following base R code:

x = "The 'quick cunning brown' fox 'jumps up and over' the lazy dog"
gr <- gregexpr("'[^']+'", x)
mat <- regmatches(x, gr)
regmatches(x, gr) <- lapply(mat, gsub, pattern="\\s", replacement="_")
x
## => [1] "The 'quick_cunning_brown' fox 'jumps_up_and_over' the lazy dog"

See this R demo . Or, use gsubfn :

> library(gsubfn)
> rx <- "'[^']+'"
> s = "The 'quick cunning brown' fox 'jumps up and over' the lazy dog"
> gsubfn(rx, ~ gsub("\\s", "_", x), s)
[1] "The 'quick_cunning_brown' fox 'jumps_up_and_over' the lazy dog"
> 

To support escape sequences, you may use a much more complex PCRE regex:

(?<!\\)(?:\\{2})*\K'[^'\\]*(?:\\.[^'\\]*)*'

Details :

  • (?<!\\\\) - no \\ immediately before the current location
  • (?:\\\\{2})* - zero or more sequences of 2 \\ s
  • \\K - match reset operator
  • ' - a single quote
  • [^'\\\\]* - zero or more chars other than ' and \\
  • (?:\\\\.[^'\\\\]*)* - zero or more sequences of:
    • \\\\. - a \\ followed with any char but a newline
    • [^'\\\\]* - zero or more chars other than ' and \\
  • ' - a single quote.

And the R demo would look like

x = "The \\' \\\\\\' \\\\\\\\'quick \\'cunning\\' brown' fox 'jumps up \\'and\\' over' the lazy dog"
cat(x, sep="\n")
gr <- gregexpr("(?<!\\\\)(?:\\\\{2})*\\K'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", x, perl=TRUE)
mat <- regmatches(x, gr)
regmatches(x, gr) <- lapply(mat, gsub, pattern="\\s", replacement="_")
cat(x, sep="\n")

Output:

The \' \\\' \\\\'quick \'cunning\' brown' fox 'jumps up \'and\' over' the lazy dog
The \' \\\' \\\\'quick_\'cunning\'_brown' fox 'jumps_up_\'and\'_over' the lazy dog

Try this:

require(stringi)
s = "The 'quick brown' fox 'jumps over' the lazy dog"
stri_replace_all(s, regex="('[a-z]+) ([a-z]+')", '$1_$2')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM