简体   繁体   中英

extract a substring where the characters next to it varies using regex in r

I have some strings like below. I need to extract color part from the strings.

s1=  'color: red greenSize: 2 CountVerified Purchase'
s2=  'color: red greenVerified Purchase'
s3=  'color: red greenSize: 2 Count'
s4=  'color: red green'

I used str_replace like below. It only works for s1 and s3 . Not for s2 and s4 .

str_replace(s1, 'color:\\s(.*)Size:\\s.*', '\\1')

Does anyone know how I can extract the colors from the string that work for ALL 4 cases?

These expressions might work:

color:\s(.*?)(Size.*|[A-Z].*|$)
color:\s(.*?)([A-Z].*|$)

and our code might look like,

str_replace(s1, 'color:\\s(.*?)([A-Z].*|$)', '\\1')

Demo 1

Demo 2

RegEx Circuit

jex.im visualizes regular expressions:

在此处输入图片说明

Here is my attempt using regmatches , along with the following regex pattern:

color: (\\S+) (\\S+)(?=Size|Verified|$)

This isolates the first and second colors, the second color's end being given by either the words Size or Verified , of the end of the string.

x <- c("color: red greenSize: 2 CountVerified Purchase",
       "color: red greenVerified Purchase",
       "color: red greenSize: 2 Count",
       "color: red green")
sapply(x, function(x) {
    result <- regmatches(x, regexec("color: (\\S+) (\\S+)(?=Size|Verified|$)", x, perl=TRUE))[[1]]
    c(result[2], result[3])
})

This outputs (a bit messy):

     color: red greenSize: 2 CountVerified Purchase
[1,] "red"
[2,] "green"
     color: red greenVerified Purchase color: red greenSize: 2 Count
[1,] "red"                             "red"
[2,] "green"                           "green"
     color: red green
[1,] "red"
[2,] "green"

Is it just me or are all those colors in lowercase? If this happens to be the case, you could simply do:

pattern <- "color:\\s*([a-z ]+).*"
gsub(pattern, "\\1", your_strings_here)

See a demo on regex101.com .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM