[英]gsub - trim a sequence of letters/numbers from the end of a word
I have a list of 900 names such like these:我有一个包含 900 个名字的列表,例如:
I am interested to know how many of miRs have "0" before the last dot in the sequence.我很想知道有多少 miR 在序列中的最后一个点之前有“0”。 I have tried different combinations of grep and gsub (to remove the letters/numbers from after the last dot) but I cannot work it out due to the variable length of letters after at the end.
我尝试了 grep 和 gsub 的不同组合(从最后一个点之后删除字母/数字),但由于最后的字母长度可变,我无法解决。 I will be very grateful for your help.
我将非常感谢您的帮助。
Expected output is either:预期输出是:
names <- c("miR.30a.5p.11TC.0.0.0",
"miR.30a.5p.0.G.0.ag",
"miR.21.5p.0.A.0.tga",
"miR.30a.3p.0.TA.c.c",
"miR.30a.5p.11TC.0.0",
"miR.30a.5p.0.G.0")
filt <- unlist(lapply(lapply(strsplit(names, ".", fixed=T), rev), "[[", 2)) == "0" # boolean vector with TRUE where
sum(filt) # nb of files with zeros as second last element
Best, Chris最好的,克里斯
An idea via base R,基于 R 的想法,
sum(sapply(x, function(i){i1 <- strsplit(i, '.', fixed = TRUE)[[1]];
i1[(length(i1)) - 1] == 0}))
#[1] 3
Or using stringr
package,或者使用
stringr
包,
#For the sum,
sum(stringr::word(x, -2, sep = '\\.') == 0)
#[1] 3
#For trimming
stringr::word(x, 1, -2, sep = '\\.')
#[1] "miR.30a.5p.11TC.0.0" "miR.30a.5p.0.G.0" "miR.21.5p.0.A.0" "miR.30a.3p.0.TA.c"
DATA数据
x <- c('miR.30a.5p.11TC.0.0.0',
'miR.30a.5p.0.G.0.ag',
'miR.21.5p.0.A.0.tga',
'miR.30a.3p.0.TA.c.c')
sum(gsub('.*\\.(.*)\\..*','\\1',x)==0)
[1] 3
.*
any number of characters and it may contain dot as well .*
任意数量的字符,也可能包含点\\\\.
a literal dot(.*)
group of any number of characters. (.*)
任意数量的字符组。 we will get this group back using \\\\1
\\\\1
取回该组\\\\..*
a literal dot "the final dot" followed by any number of characters \\\\..*
一个文字点“最后一个点”后跟任意数量的字符
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.