I have strings of DNA sequences such as: "ACGTTATATTTATGTTTTGGGATTTTAGCAGGAATGATTGGTACTGCTTTCAGTATGTTAATTAGATTAGAGTTATCGGGACCGGGATCAATGTTAGGGGATATCATTTATACAATGTTATTGTTACTGCTCATGCTTTTGTTATGATTTTTTTTTTAGTAATGCCTGTGATGATTGGGGGGTTTGGGAATTGGTTAGTACCATTATATATTGGTGCCCCAGATATGGCATTCCCTCGATTAAATAATATAAGTTTTTGATTATTACCGCCGGCTTTAAG"
Is there a way I can remove the letters at specific positions eg at position 20 in R?
I think I may be able to use regex but I don't think I am getting the expression right.
Thanks
One option is to capture the characters until the 19, remove the 20th element and capture the remaining characterss
str2 <- sub("^(.{1,19}).(.*)", "\\1\\2", str1)
Or with a single capture group
sub("^(.{1,19}).", "\\1", str1)
Or another option is str_sub
library(stringr)
nchar(str1)
#[1] 280
str_sub(str1, 20, 20) <- ""
nchar(str1)
#[1] 279
str1 <- "ACGTTATATTTATGTTTTGGGATTTTAGCAGGAATGATTGGTACTGCTTTCAGTATGTTAATTAGATTAGAGTTATCGGGACCGGGATCAATGTTAGGGGATATCATTTATACAATGTTATTGTTACTGCTCATGCTTTTGTTATGATTTTTTTTTTAGTAATGCCTGTGATGATTGGGGGGTTTGGGAATTGGTTAGTACCATTATATATTGGTGCCCCAGATATGGCATTCCCTCGATTAAATAATATAAGTTTTTGATTATTACCGCCGGCTTTAAG"
Alternatively, without the use of regex
expression (and probably less straightforward tha @akrun's answer) you can use strsplit
to extract each character of your string as a sequence, remove the 20th, and paste them back together.
seq <- "ACGTTATATTTATGTTTTGGGATTTTAGCAGGAATGATTGGTACTGCTTTCAGTATGTTAATTAGATTAGAGTTATCGGGACCGGGATCAATGTTAGGGGATATCATTTATACAATGTTATTGTTACTGCTCATGCTTTTGTTATGATTTTTTTTTTAGTAATGCCTGTGATGATTGGGGGGTTTGGGAATTGGTTAGTACCATTATATATTGGTGCCCCAGATATGGCATTCCCTCGATTAAATAATATAAGTTTTTGATTATTACCGCCGGCTTTAAG"
nchar(seq)
[1] 280
seq2 <- paste(unlist(strsplit(seq,""))[-20], collapse = "")
nchar(seq2)
[1] 279
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.