gsub - 从单词末尾修剪一系列字母/数字

Question

I have a list of 900 names such like these:我有一个包含 900 个名字的列表，例如：

miR.30a.5p.11TC.0.0.0 miR.30a.5p.11TC.0.0.0
miR.30a.5p.0.G.0.ag miR.30a.5p.0.G.0.ag
miR.21.5p.0.A.0.tga miR.21.5p.0.A.0.tga
miR.30a.3p.0.TA.cc miR.30a.3p.0.TA.cc

I am interested to know how many of miRs have "0" before the last dot in the sequence.我很想知道有多少 miR 在序列中的最后一个点之前有“0”。 I have tried different combinations of grep and gsub (to remove the letters/numbers from after the last dot) but I cannot work it out due to the variable length of letters after at the end.我尝试了 grep 和 gsub 的不同组合（从最后一个点之后删除字母/数字），但由于最后的字母长度可变，我无法解决。 I will be very grateful for your help.我将非常感谢您的帮助。

Expected output is either:预期输出是：

The number of miRs with 0 before the last dot (such as this one: miR.21.5p.0.A.0.tga, but not this one: miR.30a.3p.0.TA.cc).最后一个点前有 0 的 miR 的数量（例如这个：miR.21.5p.0.A.0.tga，但不是这个：miR.30a.3p.0.TA.cc）。
OR trimming everything that is after the last dot或修剪最后一个点之后的所有内容
miR.30a.5p.11TC.0.0 miR.30a.5p.11TC.0.0
miR.30a.5p.0.G.0 miR.30a.5p.0.G.0
etc.等等。

Answer 1

example data示例数据

names <- c("miR.30a.5p.11TC.0.0.0", 
       "miR.30a.5p.0.G.0.ag", 
       "miR.21.5p.0.A.0.tga", 
       "miR.30a.3p.0.TA.c.c", 
       "miR.30a.5p.11TC.0.0", 
       "miR.30a.5p.0.G.0")

workflow工作流程

Split strings by '.'用 '.' 分割字符串
reverse splitted vectors反向分割向量
take second element取第二个元素

filt <- unlist(lapply(lapply(strsplit(names, ".", fixed=T), rev), "[[", 2)) == "0" # boolean vector with TRUE where 
sum(filt) # nb of files with zeros as second last element

Best, Chris最好的，克里斯

Answer 2

An idea via base R,基于 R 的想法，

sum(sapply(x, function(i){i1 <- strsplit(i, '.', fixed = TRUE)[[1]]; 
                          i1[(length(i1)) - 1] == 0}))

#[1] 3

Or using stringr package,或者使用stringr包，

#For the sum,
sum(stringr::word(x, -2, sep = '\\.') == 0)
#[1] 3

#For trimming
stringr::word(x, 1, -2, sep = '\\.')
#[1] "miR.30a.5p.11TC.0.0" "miR.30a.5p.0.G.0"    "miR.21.5p.0.A.0"   "miR.30a.3p.0.TA.c"

DATA数据

x <- c('miR.30a.5p.11TC.0.0.0', 
       'miR.30a.5p.0.G.0.ag', 
       'miR.21.5p.0.A.0.tga', 
       'miR.30a.3p.0.TA.c.c')

Answer 3

sum(gsub('.*\\.(.*)\\..*','\\1',x)==0)
[1] 3

.* any number of characters and it may contain dot as well .*任意数量的字符，也可能包含点

\\\\. a literal dot文字点

(.*) group of any number of characters. (.*)任意数量的字符组。 we will get this group back using \\\\1我们将使用\\\\1取回该组

\\\\..* a literal dot "the final dot" followed by any number of characters \\\\..*一个文字点“最后一个点”后跟任意数量的字符

gsub - 从单词末尾修剪一系列字母/数字

问题描述

3 个解决方案

解决方案1
1 2019-01-28 10:58:11

example data示例数据

workflow工作流程

解决方案2
0 2019-01-28 10:43:43

解决方案3
0 已采纳 2019-01-28 10:44:09

gsub - 从单词末尾修剪一系列字母/数字

问题描述

3 个解决方案

解决方案1 1 2019-01-28 10:58:11

example data示例数据

workflow工作流程

解决方案2 0 2019-01-28 10:43:43

解决方案3 0 已采纳 2019-01-28 10:44:09

解决方案1
1 2019-01-28 10:58:11

解决方案2
0 2019-01-28 10:43:43

解决方案3
0 已采纳 2019-01-28 10:44:09