[英]collapsing a data frame by recurring elements in column [R]
I am stuck in a small thing. 我陷入了小事。 I have a data frame in R like this
我在R中有这样的数据框
chrom exonCount
chr1 3
chr1 4
chr1 5
chr1 5
chr1 9
chr1 10
chr2 7
chr2 11
chr2 13
chr3 7
chr4 7
I just want the output as 我只想要输出为
chr1 36
chr2 31
chr3 7
chr4 7
I assume the aggregate function can do that but I am lost in usage. 我认为聚合函数可以做到这一点,但是我迷失了用法。
Thanks 谢谢
I think the plyr
package does this the clearest, but using base R 我认为
plyr
软件包最清楚地做到了这一点,但使用的是基数R
dat <- structure(list(chrom = c("chr1", "chr1", "chr1", "chr1", "chr1",
"chr1", "chr2", "chr2", "chr2", "chr3", "chr4"), exonCount = c(3L,
4L, 5L, 5L, 9L, 10L, 7L, 11L, 13L, 7L, 7L)), .Names = c("chrom",
"exonCount"), class = "data.frame", row.names = c(NA, -11L))
aggregate(data=dat, exonCount ~ chrom, FUN=sum)
chrom exonCount
1 chr1 36
2 chr2 31
3 chr3 7
4 chr4 7
>
if you want to use the plyr package try: 如果要使用plyr软件包,请尝试:
df<-read.table(header=T,text="chrom exonCount
chr1 3
chr1 4
chr1 5
chr1 5
chr1 9
chr1 10
chr2 7
chr2 11
chr2 13
chr3 7
chr4 7
")
library(plyr)
ddply(df,.(chrom),summarise,sum(exonCount))
Another approach using ddply
would be 使用
ddply
另一种方法是
ddply(df, .(chrom), numcolwise(sum))
chrom exonCount
1 chr1 36
2 chr2 31
3 chr3 7
4 chr4 7
This is the fastest method here but is less intuitive than plyr functions or aggregate (using Justin's dat): 这是这里最快的方法,但不如plyr函数或聚合(使用Justin的dat)直观:
x <- data.frame(sort(unique(dat$chrom)),
unlist(lapply(split(dat$exonCount, dat$chrom), sum)))
colnames(x) <- colnames(dat)
rownames(x) <- NULL
x
This is the second fastest method here: 这是这里第二快的方法:
x <- tapply(dat$exonCount, dat$chrom, sum)
x <- data.frame(names(x), x)
names(x) <- names(dat); rownames(x) <- NULL
x
The data.table package is a little slower in benchmarking here because either 1)I'm messing up the syntax or 2) it's designed for much larger problems and doesn't reveal how good it is on a fake data set like this: data.table程序包在这里进行基准测试的速度稍慢一些,因为1)我搞砸了语法,或者2)它是为处理更大的问题而设计的,并且没有显示出对像这样的伪数据集的良好性能:
library(data.table)
dat2 <- data.table(dat)
dat2[,list(pop=sum(exonCount)), list(chrom)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.