简体   繁体   中英

Regex: Replacing all spaces between two characters

Consider the following string: This is an example: this is another one, and this is yet another, and other, and so on. I want to replace all space characters between : and , . So it would look like this This is an example:_this_is_another_one, and this is yet another, and other, and so on.

What I've tried so far:

  • (?<=:)\\s+(?=[^,]*,) (only matches the first space)
  • :\\s+(?=[^:,]*,) (Same as above)
  • \\s+(?=[^:,]*,) (Matches This is an example:_this_is_another_one,_and_this_is_yet_another,_and_other, and so on )

Update : there is a simple way to replace anything in between arbitrary strings in R using stringr::str_replace_all using an anonymous function as the replacement argument:

Generic stringr approach

library(stringr)

# left - left boundary
# right - right boundary
# x - input
# what - regex pattern to search for inside matches
# repl - replacement text for the in-pattern matches
ReplacePatternBetweenTwoStrings <- function(left, right, x, what, repl) {
  left  <- gsub("([][{}()+*^${|\\\\?.])", "\\\\\\1", left)
  right <- gsub("([][{}()+*^${|\\\\?.])", "\\\\\\1", right)
  str_replace_all(x, 
     paste0("(?s)(?<=", left, ").*?(?=", right, ")"),
     function(z) gsub(what, repl, z)
  )
}

x <- "This is an example: this is another one, and this is yet another, and other, and so on."
ReplacePatternBetweenTwoStrings(":", ",", x, "\\s+", "_")
## => [1] "This is an example:_this_is_another_one, and this is yet another, and other, and so on."

See this R demo .

Replacing all whitespaces between the closest : and ,

This is a simple edge case of the above when :[^:,]+, matches a : , then any amount of chars other than : and , (the delimiter chars) and then a , , then the whitespaces are replaced with underscores in the matches only:

stringr::str_replace_all(x, ":[^:,]+,", function(z) gsub("\\s+", "_", z))

See the regex demo

Original answer (scales rather poorly)

You may use the following regex:

(?:\G(?!^)|:)[^,]*?\K\s(?=[^,]*,)

Replace with _ . See the regex demo .

Details

  • (?:\\G(?!^)|:) - the end of the previous match ( \\G(?!)^ ) or a colon
  • [^,]*? - any 0+ chars other than , as few as possible
  • \\K - match reset operator discarding the text matched so far
  • \\s - a whitespace
  • (?=[^,]*,) - a positive lookahead check that makes sure there is a , after zero or more chars other than a comma.

R demo :

re <- "(?:\\G(?!^)|:)[^,]*?\\K\\s(?=[^,]*,)"
x <- "This is an example: this is another one, and this is yet another, and other, and so on."
gsub(re, "_", x, perl=TRUE)
# => [1] "This is an example:_this_is_another_one, and this is yet another, and other, and so on."

Here is a slightly gross answer:

txt="This is an example: this is another one, and this is yet"

split_str=unlist(strsplit(gsub("^(.*:)(.*)(,.*)", "\\1$\\2$\\3", txt), split="$", fixed=T))

paste0(split_str[1], gsub(" ", "_",split_str[2]), split_str[3])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM