简体   繁体   中英

Regular expression: extract string between two characters/strings

I have a model formula (as string) and want to extract the value for a specific argument, id in my case. Now I have found a way that returns the string without the needed string value. I want exactly the opposite, I only want the string value that is missing in my result:

xx <- "gee(formula = breaks ~ tension, id = wool, data = warpbreaks)"
sub("(?=(id=|id =))([a-zA-Z].*)(?=,)", "\\1", xx, perl =T)
#> [1] "gee(formula = breaks ~ tension, id =, data = warpbreaks)"

wool is missing in the return value, but I only want to have wool as resulting string... Can anyone help me finding the correct regex pattern?

Instead of regex here, you could parse() the string and grab the id argument by name.

as.character(parse(text = xx)[[1]]$id)
# [1] "wool"

You may use

xx <- "gee(formula = breaks ~ tension, id = wool, data = warpbreaks)"
sub(".*\\bid\\s*=\\s*(\\w+).*", "\\1", xx)
## or, if the value extracted may contain any chars but commas
sub(".*\\bid\\s*=\\s*([^,]+).*", "\\1", xx)

See the R demo and the regex demo .

Details

  • .* - any 0+ chars, as many as possible
  • \\\\bid - a whole word id ( \\b is a word boundary)
  • \\\\s*=\\\\s* - a = enclosed with 0+ whitespaces
  • (\\\\w+) - Capturing group 1 ( \\\\1 in the replacement pattern refers to this value): one or more letters, digits or underscores (or [^,]+ matches 1+ chars other than a comma)
  • .* - the rest of the string.

Other alternative solutions:

> xx <- "gee(formula = breaks ~ tension, id = wool, data = warpbreaks)"
> regmatches(xx, regexpr("\\bid\\s*=\\s*\\K[^,]+", xx, perl=TRUE))
[1] "wool"

The pattern matches id , = enclosed with 0+ whitespaces, then \\K omits the matched text and only 1+ chars other than , land in the match value.

Or, a capturing approach with stringr::str_match is also valid here:

> library(stringr)
> str_match(xx, "\\bid\\s*=\\s*([^,]+)")[,2]
[1] "wool"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM