简体   繁体   中英

Is there a way to remove a character by index from a string in R?

I have strings of DNA sequences such as: "ACGTTATATTTATGTTTTGGGATTTTAGCAGGAATGATTGGTACTGCTTTCAGTATGTTAATTAGATTAGAGTTATCGGGACCGGGATCAATGTTAGGGGATATCATTTATACAATGTTATTGTTACTGCTCATGCTTTTGTTATGATTTTTTTTTTAGTAATGCCTGTGATGATTGGGGGGTTTGGGAATTGGTTAGTACCATTATATATTGGTGCCCCAGATATGGCATTCCCTCGATTAAATAATATAAGTTTTTGATTATTACCGCCGGCTTTAAG"

Is there a way I can remove the letters at specific positions eg at position 20 in R?

I think I may be able to use regex but I don't think I am getting the expression right.

Thanks

One option is to capture the characters until the 19, remove the 20th element and capture the remaining characterss

str2 <- sub("^(.{1,19}).(.*)", "\\1\\2", str1)

Or with a single capture group

sub("^(.{1,19}).", "\\1", str1)

Or another option is str_sub

library(stringr)
nchar(str1)
#[1] 280
str_sub(str1, 20, 20) <- ""
nchar(str1)
#[1] 279

data

str1 <- "ACGTTATATTTATGTTTTGGGATTTTAGCAGGAATGATTGGTACTGCTTTCAGTATGTTAATTAGATTAGAGTTATCGGGACCGGGATCAATGTTAGGGGATATCATTTATACAATGTTATTGTTACTGCTCATGCTTTTGTTATGATTTTTTTTTTAGTAATGCCTGTGATGATTGGGGGGTTTGGGAATTGGTTAGTACCATTATATATTGGTGCCCCAGATATGGCATTCCCTCGATTAAATAATATAAGTTTTTGATTATTACCGCCGGCTTTAAG"

Alternatively, without the use of regex expression (and probably less straightforward tha @akrun's answer) you can use strsplit to extract each character of your string as a sequence, remove the 20th, and paste them back together.

seq <- "ACGTTATATTTATGTTTTGGGATTTTAGCAGGAATGATTGGTACTGCTTTCAGTATGTTAATTAGATTAGAGTTATCGGGACCGGGATCAATGTTAGGGGATATCATTTATACAATGTTATTGTTACTGCTCATGCTTTTGTTATGATTTTTTTTTTAGTAATGCCTGTGATGATTGGGGGGTTTGGGAATTGGTTAGTACCATTATATATTGGTGCCCCAGATATGGCATTCCCTCGATTAAATAATATAAGTTTTTGATTATTACCGCCGGCTTTAAG"

nchar(seq)
[1] 280

seq2 <- paste(unlist(strsplit(seq,""))[-20], collapse = "")
nchar(seq2)
[1] 279

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM