使用r中的regex提取子字符串，其旁边的字符会发生变化

Question

I have some strings like below. 我有一些像下面这样的字符串。 I need to extract color part from the strings. 我需要从字符串中提取颜色部分。

s1=  'color: red greenSize: 2 CountVerified Purchase'
s2=  'color: red greenVerified Purchase'
s3=  'color: red greenSize: 2 Count'
s4=  'color: red green'

I used str_replace like below. 我使用了如下的str_replace 。 It only works for s1 and s3 . 它仅适用于s1和s3 。 Not for s2 and s4 . 不适用于s2和s4 。

str_replace(s1, 'color:\\s(.*)Size:\\s.*', '\\1')

Does anyone know how I can extract the colors from the string that work for ALL 4 cases? 有谁知道我该如何从适用于所有4种情况的字符串中提取颜色？

Answer 1

These expressions might work: 这些表达式可能有效：

color:\s(.*?)(Size.*|[A-Z].*|$)
color:\s(.*?)([A-Z].*|$)

and our code might look like, 我们的代码可能看起来像

str_replace(s1, 'color:\\s(.*?)([A-Z].*|$)', '\\1')

Demo 1 演示1

Demo 2 演示2

RegEx Circuit RegEx电路

jex.im visualizes regular expressions: jex.im可视化正则表达式：

Answer 2

Here is my attempt using regmatches , along with the following regex pattern: 这是我尝试使用regmatches以及以下regex模式：

color: (\\S+) (\\S+)(?=Size|Verified|$)

This isolates the first and second colors, the second color's end being given by either the words Size or Verified , of the end of the string. 这将隔离第一和第二种颜色，第二种颜色的结尾由字符串末尾的单词Size或Verified给出。

x <- c("color: red greenSize: 2 CountVerified Purchase",
       "color: red greenVerified Purchase",
       "color: red greenSize: 2 Count",
       "color: red green")
sapply(x, function(x) {
    result <- regmatches(x, regexec("color: (\\S+) (\\S+)(?=Size|Verified|$)", x, perl=TRUE))[[1]]
    c(result[2], result[3])
})

This outputs (a bit messy): 输出（有点混乱）：

     color: red greenSize: 2 CountVerified Purchase
[1,] "red"
[2,] "green"
     color: red greenVerified Purchase color: red greenSize: 2 Count
[1,] "red"                             "red"
[2,] "green"                           "green"
     color: red green
[1,] "red"
[2,] "green"

Answer 3

Is it just me or are all those colors in lowercase? 只是我还是所有这些颜色都是小写的？ If this happens to be the case, you could simply do: 如果碰巧是这种情况，则可以执行以下操作：

pattern <- "color:\\s*([a-z ]+).*"
gsub(pattern, "\\1", your_strings_here)

See a demo on regex101.com . 参见regex101.com上的演示 。

使用r中的regex提取子字符串，其旁边的字符会发生变化

问题描述

3 个解决方案

解决方案1
2 2019-06-20 04:41:33

Demo 1 演示1

Demo 2 演示2

RegEx Circuit RegEx电路

解决方案2
2 2019-06-20 04:53:45

解决方案3
1 2019-06-20 07:13:11

使用r中的regex提取子字符串，其旁边的字符会发生变化

问题描述

3 个解决方案

解决方案1 2 2019-06-20 04:41:33

Demo 1 演示1

Demo 2 演示2

RegEx Circuit RegEx电路

解决方案2 2 2019-06-20 04:53:45

解决方案3 1 2019-06-20 07:13:11

解决方案1
2 2019-06-20 04:41:33

解决方案2
2 2019-06-20 04:53:45

解决方案3
1 2019-06-20 07:13:11