简体   繁体   English

在 R 中提取没有模式的 substring

[英]Extract a substring in R with no pattern

If one of my strings in a column looks like,如果我在一列中的一个字符串看起来像,

string = "P/project/dhi_intro_genomics/genomics/gene/pag-files-per-patient/000tg82e-99c4-4h20-9ude-d95e15005a 3c_KXgES5FtCpLhQce7mGkuMX/XML/JH_DN_S9_2000-12-27_MTW-29FEB1997UW"

Is there a str_extract code to get是否有 str_extract 代码可以获取

sub_string = "000tg82e-99c4-4h20-9ude-d95e15005a 3c"

from the original 'string'?从原始的“字符串”?

We can use the pattern to get the substring that are not a _ character after the patient/ substring我们可以使用该模式来获取在patient/ substring 之后不是_字符的 substring

library(stringr)
str_extract(string, "(?<=patient\\/)0+[^_]+")
[1] "000tg82e-99c4-4h20-9ude-d95e15005a 3c"

If there are no pattern and wants to extract the 7th element based on delimiter /如果没有模式并且想要根据分隔符提取第 7 个元素/

 trimws(strsplit(string, "/")[[1]][7], whitespace = "_.*")
[1] "000tg82e-99c4-4h20-9ude-d95e15005a 3c"

Or with str_replace或使用str_replace

str_replace(string, "([^/]+/){6}([^_]+)_.*", "\\2")
[1] "000tg82e-99c4-4h20-9ude-d95e15005a 3c"

For the new string对于新字符串

str_new <- c("/P/project/dlf_intro_aion/Y0793/Y0793_8665030498_T1_K1IJ2_ps20200918125614.htj.gz.pd5", 
"/P/project/dlf_intro_aion/H051/H0518946_032983_T1_K1ID2_ps20289239171246.par.gz"
)
str_replace(str_new, "^/?([^/]+/){4}[^_]+_([^_]+)_.*", "\\2")
[1] "8665030498" "032983"    

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM