简体   繁体   English

如何从特定单词后的字符串中提取

[英]How extract from a string after specific word

I have this string :我有这个字符串:

string <-"DIS_S_CD_EFS-NO_PCI-CD_ACT_CG-SOM_MT_ECT_CVE"

I need to extract only SOM_MT_ECT_CVE from it.我只需要从中提取SOM_MT_ECT_CVE

So for me the key word is SOM (identify its position ).所以对我来说,关键词是SOM (确定它的位置)。

I tried using this :我尝试使用这个:

d <-substr(gregexpr(pattern ='SOM',"DIS_S_CD_EFS-NO_PCI-CD_ACT_CG-SOM_MT_ECT_CVE"),
           nchar("DIS_S_CD_EFS-NO_PCI-CD_ACT_CG-SOM_MT_ECT_CVE"),"DIS_S_CD_EFS-NO_PCI-CD_ACT_CG-SOM_MT_ECT_CVE")

But it return NA values.但它返回 NA 值。

One option is sub to match characters ( .* ) until 'SOM', capture the 'SOM' to the rest of the characters in a group ( (...) ) and in the replacement use the backreference ( \\\\1 ) of the captured group一个选项是sub匹配字符 ( .* ) 直到 'SOM',将 'SOM' 捕获到一组 ( (...) ) 中的其余字符,并在替换中使用反向引用 ( \\\\1 )被捕获的组

sub(".*(SOM_.*)", "\\1", string)
#[1] "SOM_MT_ECT_CVE"

Or using stringr或者使用stringr

library(stringr)
str_extract(string, "SOM.*")
#[1] "SOM_MT_ECT_CVE"

You can split on the hyphen and get the last word, ie您可以在连字符上拆分并获得最后一个单词,即

tail(strsplit(string, '-', fixed = TRUE)[[1]], 1)
#[1] "SOM_MT_ECT_CVE"

Or with word from stringr ,或者使用stringr word

stringr::word(string, -1, sep = '-')
#[1] "SOM_MT_ECT_CVE"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM