简体   繁体   中英

gsub - trim a sequence of letters/numbers from the end of a word

I have a list of 900 names such like these:

  • miR.30a.5p.11TC.0.0.0
  • miR.30a.5p.0.G.0.ag
  • miR.21.5p.0.A.0.tga
  • miR.30a.3p.0.TA.cc

I am interested to know how many of miRs have "0" before the last dot in the sequence. I have tried different combinations of grep and gsub (to remove the letters/numbers from after the last dot) but I cannot work it out due to the variable length of letters after at the end. I will be very grateful for your help.

Expected output is either:

  • The number of miRs with 0 before the last dot (such as this one: miR.21.5p.0.A.0.tga, but not this one: miR.30a.3p.0.TA.cc).
  • OR trimming everything that is after the last dot
  • miR.30a.5p.11TC.0.0
  • miR.30a.5p.0.G.0
  • etc.

example data

names <- c("miR.30a.5p.11TC.0.0.0", 
       "miR.30a.5p.0.G.0.ag", 
       "miR.21.5p.0.A.0.tga", 
       "miR.30a.3p.0.TA.c.c", 
       "miR.30a.5p.11TC.0.0", 
       "miR.30a.5p.0.G.0")

workflow

  1. Split strings by '.'
  2. reverse splitted vectors
  3. take second element
filt <- unlist(lapply(lapply(strsplit(names, ".", fixed=T), rev), "[[", 2)) == "0" # boolean vector with TRUE where 
sum(filt) # nb of files with zeros as second last element

Best, Chris

An idea via base R,

sum(sapply(x, function(i){i1 <- strsplit(i, '.', fixed = TRUE)[[1]]; 
                          i1[(length(i1)) - 1] == 0}))

#[1] 3

Or using stringr package,

#For the sum,
sum(stringr::word(x, -2, sep = '\\.') == 0)
#[1] 3

#For trimming
stringr::word(x, 1, -2, sep = '\\.')
#[1] "miR.30a.5p.11TC.0.0" "miR.30a.5p.0.G.0"    "miR.21.5p.0.A.0"   "miR.30a.3p.0.TA.c"

DATA

x <- c('miR.30a.5p.11TC.0.0.0', 
       'miR.30a.5p.0.G.0.ag', 
       'miR.21.5p.0.A.0.tga', 
       'miR.30a.3p.0.TA.c.c')
sum(gsub('.*\\.(.*)\\..*','\\1',x)==0)
[1] 3


  • .* any number of characters and it may contain dot as well
  • \\\\. a literal dot
  • (.*) group of any number of characters. we will get this group back using \\\\1
  • \\\\..* a literal dot "the final dot" followed by any number of characters

  • The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

     
    粤ICP备18138465号  © 2020-2024 STACKOOM.COM