通过重复列[R]中的元素折叠数据框

Question

I am stuck in a small thing. 我陷入了小事。 I have a data frame in R like this 我在R中有这样的数据框

chrom exonCount
chr1         3
chr1         4
chr1         5
chr1         5
chr1         9
chr1        10
chr2         7
chr2        11
chr2        13
chr3         7
chr4         7

I just want the output as 我只想要输出为

chr1        36
chr2        31
chr3         7
chr4         7

I assume the aggregate function can do that but I am lost in usage. 我认为聚合函数可以做到这一点，但是我迷失了用法。

Thanks 谢谢

Answer 1

I think the plyr package does this the clearest, but using base R 我认为plyr软件包最清楚地做到了这一点，但使用的是基数R

dat <- structure(list(chrom = c("chr1", "chr1", "chr1", "chr1", "chr1", 
"chr1", "chr2", "chr2", "chr2", "chr3", "chr4"), exonCount = c(3L, 
4L, 5L, 5L, 9L, 10L, 7L, 11L, 13L, 7L, 7L)), .Names = c("chrom", 
"exonCount"), class = "data.frame", row.names = c(NA, -11L))

aggregate(data=dat, exonCount ~ chrom, FUN=sum)

  chrom exonCount
1  chr1        36
2  chr2        31
3  chr3         7
4  chr4         7
>

Answer 2

if you want to use the plyr package try: 如果要使用plyr软件包，请尝试：

df<-read.table(header=T,text="chrom exonCount
chr1         3
chr1         4
chr1         5
chr1         5
chr1         9
chr1        10
chr2         7
chr2        11
chr2        13
chr3         7
chr4         7
")
library(plyr)
ddply(df,.(chrom),summarise,sum(exonCount))

Answer 3

Another approach using ddply would be 使用ddply另一种方法是

ddply(df, .(chrom), numcolwise(sum))
  chrom exonCount
1  chr1        36
2  chr2        31
3  chr3         7
4  chr4         7

Answer 4

This is the fastest method here but is less intuitive than plyr functions or aggregate (using Justin's dat): 这是这里最快的方法，但不如plyr函数或聚合（使用Justin的dat）直观：

x <- data.frame(sort(unique(dat$chrom)), 
    unlist(lapply(split(dat$exonCount, dat$chrom), sum)))
colnames(x) <- colnames(dat)
rownames(x) <- NULL
x

This is the second fastest method here: 这是这里第二快的方法：

x <- tapply(dat$exonCount, dat$chrom, sum)
x <- data.frame(names(x), x)
names(x) <- names(dat); rownames(x) <- NULL
x

The data.table package is a little slower in benchmarking here because either 1)I'm messing up the syntax or 2) it's designed for much larger problems and doesn't reveal how good it is on a fake data set like this: data.table程序包在这里进行基准测试的速度稍慢一些，因为1）我搞砸了语法，或者2）它是为处理更大的问题而设计的，并且没有显示出对像这样的伪数据集的良好性能：

library(data.table)  
dat2 <- data.table(dat)
dat2[,list(pop=sum(exonCount)), list(chrom)]

通过重复列[R]中的元素折叠数据框

问题描述

4 个解决方案

解决方案1
5 已采纳 2012-08-03 14:47:42

解决方案2
4 2012-08-03 14:44:05

解决方案3
2 2012-08-03 14:47:23

解决方案4
1 2012-08-03 15:27:30

通过重复列[R]中的元素折叠数据框

问题描述

4 个解决方案

解决方案1 5 已采纳 2012-08-03 14:47:42

解决方案2 4 2012-08-03 14:44:05

解决方案3 2 2012-08-03 14:47:23

解决方案4 1 2012-08-03 15:27:30

解决方案1
5 已采纳 2012-08-03 14:47:42

解决方案2
4 2012-08-03 14:44:05

解决方案3
2 2012-08-03 14:47:23

解决方案4
1 2012-08-03 15:27:30