[英]Using seqinr package in R to count bases of an DNA sequence
I have an array which was extracted from a fasta file我有一个从 fasta 文件中提取的数组
> dat
[1] "t" "a" "t" "t" "t" "a" "c" "c" "g" "a" "c" "g" "a" "a" "a" "t" "t" "a" "a" "t" "a" "c" "c" "a" "t" "c" "a" "g" "g" "g" "t" "a" "t"
[34] "t" "a" "a" "g" "a" "t" "g" "c" "t" "a" "c" "c" "a" "a" "c" "g" "t" "g" "g" "t" "a" "t" "t" "a" "a" "a" "a" "t" "g" "t" "g" "c" "c"
[67] "c" "a" "a" "c" "c" "g" "c" "g" "a" "a" "a" "a" "a" "g" "a" "a" "a" "g" "t" "g" "g" "t" "a" "t" "a" "t" "a" "g" "g" "a" "a" "a" "a"
The sequence is much longer but for that is unimportant I wish to break up the first 100000 characters in this array into intervals of length 1000 and count the number of "g" bases in each interval.序列要长得多,但为此并不重要,我希望将此数组中的前 100000 个字符分解为长度为 1000 的间隔,并计算每个间隔中“g”碱基的数量。 So far I've tried:到目前为止,我已经尝试过:
library(seqinr)
intervals = 1000*(0:99)
g_count = count(dat[intervals+1:intervals+1000], 1)[["g"]]
but this returns the error: numerical expression has 100 elements: only the first used
any help is appreciated但这会返回错误: numerical expression has 100 elements: only the first used
任何帮助表示赞赏
To count number of 'g' in each interval you could use this base R approach:要计算每个间隔中的“g”数,您可以使用此基本 R 方法:
n <- 1000
result <- tapply(dat, ceiling(seq_along(dat)/n), function(x) sum(x == 'g'))
For example, for this vector of length 33 we divide data into interval of 11.例如,对于这个长度为 33 的向量,我们将数据划分为 11 的区间。
dat <- c("t", "a", "t", "t", "t", "a", "c", "c", "g", "a", "c", "g",
"a", "a", "a", "t", "t", "a", "a", "t", "a", "c", "c", "a", "t",
"c", "a", "g", "g", "g", "t", "a", "t")
n <- 11
result <- tapply(dat, ceiling(seq_along(dat)/n), function(x) sum(x == 'g'))
result
#1 2 3
#1 1 3
We can use rowsum
with gl
in base R
我们可以在基础rowsum
中使用带有gl
的base R
rowsum(+(dat == 'g'), as.integer(gl(length(dat), n, length(dat))))
dat <- c("t", "a", "t", "t", "t", "a", "c", "c", "g", "a", "c", "g",
"a", "a", "a", "t", "t", "a", "a", "t", "a", "c", "c", "a", "t",
"c", "a", "g", "g", "g", "t", "a", "t")
n <- 11
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.