[英]Extract a substring in R with no pattern
If one of my strings in a column looks like,如果我在一列中的一个字符串看起来像,
string = "P/project/dhi_intro_genomics/genomics/gene/pag-files-per-patient/000tg82e-99c4-4h20-9ude-d95e15005a 3c_KXgES5FtCpLhQce7mGkuMX/XML/JH_DN_S9_2000-12-27_MTW-29FEB1997UW"
Is there a str_extract code to get是否有 str_extract 代码可以获取
sub_string = "000tg82e-99c4-4h20-9ude-d95e15005a 3c"
from the original 'string'?从原始的“字符串”?
We can use the pattern to get the substring that are not a _
character after the patient/
substring我们可以使用该模式来获取在patient/
substring 之后不是_
字符的 substring
library(stringr)
str_extract(string, "(?<=patient\\/)0+[^_]+")
[1] "000tg82e-99c4-4h20-9ude-d95e15005a 3c"
If there are no pattern and wants to extract the 7th element based on delimiter /
如果没有模式并且想要根据分隔符提取第 7 个元素/
trimws(strsplit(string, "/")[[1]][7], whitespace = "_.*")
[1] "000tg82e-99c4-4h20-9ude-d95e15005a 3c"
Or with str_replace
或使用str_replace
str_replace(string, "([^/]+/){6}([^_]+)_.*", "\\2")
[1] "000tg82e-99c4-4h20-9ude-d95e15005a 3c"
For the new string对于新字符串
str_new <- c("/P/project/dlf_intro_aion/Y0793/Y0793_8665030498_T1_K1IJ2_ps20200918125614.htj.gz.pd5",
"/P/project/dlf_intro_aion/H051/H0518946_032983_T1_K1ID2_ps20289239171246.par.gz"
)
str_replace(str_new, "^/?([^/]+/){4}[^_]+_([^_]+)_.*", "\\2")
[1] "8665030498" "032983"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.