Suppose I have this DNAStringSet (as example)
dataset1
A DNAStringSet instance of length 38874
width seq names
[1] 2617 GGC yellow
[2] 4306 ACG blue
[3] 1070 CTC red
[4] 1870 CAC white
[5] 3732 CAC brown
... ... ...
[38870] 390 TGC black
[38871] 1970 CAG orange
and the I have a vector containing the names of some of these sequences:
dataset2 <- c("blue","black","red","brown")
How can I subset those sequences from dataset1
that have the names in dataset2
?
DNAStringSet
objects can be subset by name just like a normal R list using square bracket notation:
afastafile <- DNAStringSet(c("GCAAATGGG", "CCCGGGTT", "AAAGGGTT", "TTTGGGCC"))
names(afastafile) <- c("ABC1_1", "ABC2_1", "ABC3_1", "ABC4_1")
afastafile
A DNAStringSet instance of length 4
width seq names
[1] 9 GCAAATGGG ABC1_1
[2] 8 CCCGGGTT ABC2_1
[3] 8 AAAGGGTT ABC3_1
[4] 8 TTTGGGCC ABC4_1
nms <- c('ABC1_1', 'ABC3_1')
afastafile[nms]
A DNAStringSet instance of length 2
width seq names
[1] 9 GCAAATGGG ABC1_1
[2] 8 AAAGGGTT ABC3_1
In your case, you'd just do dataset1[dataset2]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.