简体   繁体   English

R:正则表达式环顾四周,以掌握两种模式之间的关联

[英]R: regular expression lookaround(s) to grab whats between two patterns

I have a vector with strings like: 我有一个带有字符串的向量,例如:

x <-c('kjsdf_class-X1(z)20_sample-318TT1X.3','kjjwer_class-Z3(z)29_sample-318TT2X.4')

I wanted to use regular expressions to get what is between substrings 'class-' and '_sample' (such as 'X1(z)20' and 'Z3(z)29' in x ), and thought the lookaround regex ((?=...), (?!...),... and so) would do it. 我想用正则表达式得到的是子“讲座”和“_Sample”(如“X1(Z)20”和“Z3(Z)29”之间x ),并认为环视正则表达式((? = ...),(?!...),...等等)就可以做到。 Cannot get it to work though! 虽然无法正常工作!

Sorry if this is similar to other SO questions eg here or here ). 很抱歉,如果这与其他SO问题类似,例如herehere )。

This is a bit different then what you had in mind, but it will do the job. 这与您的想法有些不同,但是可以完成工作。

gsub("(.*class-)|(.)|(_sample.*)", "\\2", x)

The logic is the following, you have 3 "sets" of strings: 逻辑如下,您有3组“字符串”:

1) characters .* ending in class- 1)字符.*期末class-

2) characters . 2)字符.

3) Characters starting with _sample and characters afterwords .* 3)以_sample字符和后缀.*字符

From those you want to keep the second "set" \\\\2 . 从那些您想要保留第二个“集合” \\\\2

Or another maybe easier to understand: 或者另一个可能更容易理解:

gsub("(.*class-)|(_sample.*)", "", x)

Take any number of characters that end in class- and the string _sample followed by any number of characters, and substitute them with the NULL character "" 接受以class-结尾的任意数量的字符,字符串_sample后跟任意数量的字符,然后将它们替换为NULL字符""

We could use str_extract_all from library(stringr) 我们可以使用str_extract_alllibrary(stringr)

 library(stringr)
 unlist(str_extract_all(x, '(?<=class-)[^_]+(?=_sample)'))
 #[1] "X1(z)20" "Z3(z)29"

This should also work if there are multiple instances of the pattern within a string 如果字符串中有模式的多个实例,这也应该起作用

 x1 <- paste(x, x)
 str_extract_all(x1, '(?<=class-)[^_]+(?=_sample)')
 #[[1]]
 #[1] "X1(z)20" "X1(z)20"

 #[[2]]
 #[1] "Z3(z)29" "Z3(z)29"

Basically, we are matching the characters that are between the two lookarounds ( (?<=class-) and (?=_sample) ). 基本上,我们匹配两个环视( (?<=class-)(?=_sample) )之间的字符。 We extract characters that is not a _ (based on the example) preceded by class- and succeded by _sample . 我们提取不是_字符(基于示例),该字符前面是class-_sample_sample

gsub('.*-([^-]+)_.*','\\1',x)
[1] "X1(z)20" "Z3(z)29"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM