在 R 中使用 seqinr package 计算 DNA 序列的碱基

Question

I have an array which was extracted from a fasta file我有一个从 fasta 文件中提取的数组

> dat
  [1] "t" "a" "t" "t" "t" "a" "c" "c" "g" "a" "c" "g" "a" "a" "a" "t" "t" "a" "a" "t" "a" "c" "c" "a" "t" "c" "a" "g" "g" "g" "t" "a" "t"
  [34] "t" "a" "a" "g" "a" "t" "g" "c" "t" "a" "c" "c" "a" "a" "c" "g" "t" "g" "g" "t" "a" "t" "t" "a" "a" "a" "a" "t" "g" "t" "g" "c" "c"
  [67] "c" "a" "a" "c" "c" "g" "c" "g" "a" "a" "a" "a" "a" "g" "a" "a" "a" "g" "t" "g" "g" "t" "a" "t" "a" "t" "a" "g" "g" "a" "a" "a" "a"

The sequence is much longer but for that is unimportant I wish to break up the first 100000 characters in this array into intervals of length 1000 and count the number of "g" bases in each interval.序列要长得多，但为此并不重要，我希望将此数组中的前 100000 个字符分解为长度为 1000 的间隔，并计算每个间隔中“g”碱基的数量。 So far I've tried:到目前为止，我已经尝试过：

library(seqinr)
intervals = 1000*(0:99)
g_count = count(dat[intervals+1:intervals+1000], 1)[["g"]]

but this returns the error: numerical expression has 100 elements: only the first used any help is appreciated但这会返回错误： numerical expression has 100 elements: only the first used任何帮助表示赞赏

Answer 1

To count number of 'g' in each interval you could use this base R approach:要计算每个间隔中的“g”数，您可以使用此基本 R 方法：

n <- 1000
result <- tapply(dat, ceiling(seq_along(dat)/n), function(x) sum(x == 'g'))

For example, for this vector of length 33 we divide data into interval of 11.例如，对于这个长度为 33 的向量，我们将数据划分为 11 的区间。

dat <- c("t", "a", "t", "t", "t", "a", "c", "c", "g", "a", "c", "g", 
"a", "a", "a", "t", "t", "a", "a", "t", "a", "c", "c", "a", "t", 
"c", "a", "g", "g", "g", "t", "a", "t")

n <- 11
result <- tapply(dat, ceiling(seq_along(dat)/n), function(x) sum(x == 'g'))
result

#1 2 3 
#1 1 3

Answer 2

We can use rowsum with gl in base R我们可以在基础rowsum中使用带有gl的base R

rowsum(+(dat == 'g'), as.integer(gl(length(dat), n, length(dat))))

data数据

dat <- c("t", "a", "t", "t", "t", "a", "c", "c", "g", "a", "c", "g", 
"a", "a", "a", "t", "t", "a", "a", "t", "a", "c", "c", "a", "t", 
"c", "a", "g", "g", "g", "t", "a", "t")

n <- 11

在 R 中使用 seqinr package 计算 DNA 序列的碱基

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-12-16 07:22:43

解决方案2
1 2020-12-16 17:57:43

data数据

在 R 中使用 seqinr package 计算 DNA 序列的碱基

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-12-16 07:22:43

解决方案2 1 2020-12-16 17:57:43

data数据

解决方案1
1 已采纳 2020-12-16 07:22:43

解决方案2
1 2020-12-16 17:57:43