简体   繁体   English

在 R 中使用 seqinr package 计算 DNA 序列的碱基

[英]Using seqinr package in R to count bases of an DNA sequence

I have an array which was extracted from a fasta file我有一个从 fasta 文件中提取的数组

> dat
  [1] "t" "a" "t" "t" "t" "a" "c" "c" "g" "a" "c" "g" "a" "a" "a" "t" "t" "a" "a" "t" "a" "c" "c" "a" "t" "c" "a" "g" "g" "g" "t" "a" "t"
  [34] "t" "a" "a" "g" "a" "t" "g" "c" "t" "a" "c" "c" "a" "a" "c" "g" "t" "g" "g" "t" "a" "t" "t" "a" "a" "a" "a" "t" "g" "t" "g" "c" "c"
  [67] "c" "a" "a" "c" "c" "g" "c" "g" "a" "a" "a" "a" "a" "g" "a" "a" "a" "g" "t" "g" "g" "t" "a" "t" "a" "t" "a" "g" "g" "a" "a" "a" "a"

The sequence is much longer but for that is unimportant I wish to break up the first 100000 characters in this array into intervals of length 1000 and count the number of "g" bases in each interval.序列要长得多,但为此并不重要,我希望将此数组中的前 100000 个字符分解为长度为 1000 的间隔,并计算每个间隔中“g”碱基的数量。 So far I've tried:到目前为止,我已经尝试过:

library(seqinr)
intervals = 1000*(0:99)
g_count = count(dat[intervals+1:intervals+1000], 1)[["g"]]

but this returns the error: numerical expression has 100 elements: only the first used any help is appreciated但这会返回错误: numerical expression has 100 elements: only the first used任何帮助表示赞赏

To count number of 'g' in each interval you could use this base R approach:要计算每个间隔中的“g”数,您可以使用此基本 R 方法:

n <- 1000
result <- tapply(dat, ceiling(seq_along(dat)/n), function(x) sum(x == 'g'))

For example, for this vector of length 33 we divide data into interval of 11.例如,对于这个长度为 33 的向量,我们将数据划分为 11 的区间。

dat <- c("t", "a", "t", "t", "t", "a", "c", "c", "g", "a", "c", "g", 
"a", "a", "a", "t", "t", "a", "a", "t", "a", "c", "c", "a", "t", 
"c", "a", "g", "g", "g", "t", "a", "t")

n <- 11
result <- tapply(dat, ceiling(seq_along(dat)/n), function(x) sum(x == 'g'))
result

#1 2 3 
#1 1 3 

We can use rowsum with gl in base R我们可以在基础rowsum中使用带有glbase R

rowsum(+(dat == 'g'), as.integer(gl(length(dat), n, length(dat))))

data数据

dat <- c("t", "a", "t", "t", "t", "a", "c", "c", "g", "a", "c", "g", 
"a", "a", "a", "t", "t", "a", "a", "t", "a", "c", "c", "a", "t", 
"c", "a", "g", "g", "g", "t", "a", "t")

n <- 11

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM