[英]R: regular expression lookaround(s) to grab whats between two patterns
I have a vector with strings like: 我有一个带有字符串的向量,例如:
x <-c('kjsdf_class-X1(z)20_sample-318TT1X.3','kjjwer_class-Z3(z)29_sample-318TT2X.4')
I wanted to use regular expressions to get what is between substrings 'class-' and '_sample' (such as 'X1(z)20' and 'Z3(z)29' in x
), and thought the lookaround regex ((?=...), (?!...),... and so) would do it. 我想用正则表达式得到的是子“讲座”和“_Sample”(如“X1(Z)20”和“Z3(Z)29”之间
x
),并认为环视正则表达式((? = ...),(?!...),...等等)就可以做到。 Cannot get it to work though! 虽然无法正常工作!
Sorry if this is similar to other SO questions eg here or here ). 很抱歉,如果这与其他SO问题类似,例如here或here )。
This is a bit different then what you had in mind, but it will do the job. 这与您的想法有些不同,但是可以完成工作。
gsub("(.*class-)|(.)|(_sample.*)", "\\2", x)
The logic is the following, you have 3 "sets" of strings: 逻辑如下,您有3组“字符串”:
1) characters .*
ending in class-
1)字符
.*
期末class-
2) characters .
2)字符
.
3) Characters starting with _sample
and characters afterwords .*
3)以
_sample
字符和后缀.*
字符
From those you want to keep the second "set" \\\\2
. 从那些您想要保留第二个“集合”
\\\\2
。
Or another maybe easier to understand: 或者另一个可能更容易理解:
gsub("(.*class-)|(_sample.*)", "", x)
Take any number of characters that end in class-
and the string _sample
followed by any number of characters, and substitute them with the NULL
character ""
接受以
class-
结尾的任意数量的字符,字符串_sample
后跟任意数量的字符,然后将它们替换为NULL
字符""
We could use str_extract_all
from library(stringr)
我们可以使用
str_extract_all
从library(stringr)
library(stringr)
unlist(str_extract_all(x, '(?<=class-)[^_]+(?=_sample)'))
#[1] "X1(z)20" "Z3(z)29"
This should also work if there are multiple instances of the pattern within a string 如果字符串中有模式的多个实例,这也应该起作用
x1 <- paste(x, x)
str_extract_all(x1, '(?<=class-)[^_]+(?=_sample)')
#[[1]]
#[1] "X1(z)20" "X1(z)20"
#[[2]]
#[1] "Z3(z)29" "Z3(z)29"
Basically, we are matching the characters that are between the two lookarounds ( (?<=class-)
and (?=_sample)
). 基本上,我们匹配两个环视(
(?<=class-)
和(?=_sample)
)之间的字符。 We extract characters that is not a _
(based on the example) preceded by class-
and succeded by _sample
. 我们提取不是
_
字符(基于示例),该字符前面是class-
, _sample
是_sample
。
gsub('.*-([^-]+)_.*','\\1',x)
[1] "X1(z)20" "Z3(z)29"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.