[英]How extract from a string after specific word
I have this string :我有这个字符串:
string <-"DIS_S_CD_EFS-NO_PCI-CD_ACT_CG-SOM_MT_ECT_CVE"
I need to extract only SOM_MT_ECT_CVE
from it.我只需要从中提取SOM_MT_ECT_CVE
。
So for me the key word is SOM
(identify its position ).所以对我来说,关键词是SOM
(确定它的位置)。
I tried using this :我尝试使用这个:
d <-substr(gregexpr(pattern ='SOM',"DIS_S_CD_EFS-NO_PCI-CD_ACT_CG-SOM_MT_ECT_CVE"),
nchar("DIS_S_CD_EFS-NO_PCI-CD_ACT_CG-SOM_MT_ECT_CVE"),"DIS_S_CD_EFS-NO_PCI-CD_ACT_CG-SOM_MT_ECT_CVE")
But it return NA values.但它返回 NA 值。
One option is sub
to match characters ( .*
) until 'SOM', capture the 'SOM' to the rest of the characters in a group ( (...)
) and in the replacement use the backreference ( \\\\1
) of the captured group一个选项是sub
匹配字符 ( .*
) 直到 'SOM',将 'SOM' 捕获到一组 ( (...)
) 中的其余字符,并在替换中使用反向引用 ( \\\\1
)被捕获的组
sub(".*(SOM_.*)", "\\1", string)
#[1] "SOM_MT_ECT_CVE"
Or using stringr
或者使用stringr
library(stringr)
str_extract(string, "SOM.*")
#[1] "SOM_MT_ECT_CVE"
You can split on the hyphen and get the last word, ie您可以在连字符上拆分并获得最后一个单词,即
tail(strsplit(string, '-', fixed = TRUE)[[1]], 1)
#[1] "SOM_MT_ECT_CVE"
Or with word
from stringr
,或者使用stringr
word
,
stringr::word(string, -1, sep = '-')
#[1] "SOM_MT_ECT_CVE"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.