[英]Summing values in R table by 2 factors
I have a single big text file which looks as follows: 我有一个大文本文件,如下所示:
tag colony diff
1035 03 498
1035 03 -44365
1035 03 -66652
1035 04 234234
1035 04 -343
1035 04 -23423
1037 10 234234
1037 10 -343
1037 10 -23423
Most 'tags' only have a single colony, such as 1037 in the above example. 大多数“标签”只有一个菌落,例如上例中的1037。 However, some have 2, such as 1036 having both 03 and 04. What I would like to do is sum the diff column for each tag, but separately for each colony, so the output would be something like this.
但是,有些具有2,例如1036同时具有03和04。我要对每个标签的diff列求和,但对每个菌落分别求和,因此输出将是这样的。
tag colony total
1035 03 -110 519
1035 04 210 648
1037 10 210 648
So far (I've been working in R), I have been using aggregate: 到目前为止(我在R中工作),我一直在使用聚合:
x2 = aggregate(x$diff, by=list(tag=x$tag), FUN=sum)
But this would count all tags together, irrespective of colony. 但这将所有标签都算在一起,而不管菌落如何。 Is there a way of 'adding another level', so to speak, into the aggregate function, so that it counts the colonies seperately?
可以说,有没有一种方法可以“添加另一个级别”到聚合函数中,以便分别计算菌落?
Thanks 谢谢
We can use dplyr
我们可以使用
dplyr
library(dplyr)
df1 %>%
group_by(tag, colony) %>%
summarise(total = sum(diff))
Or data.table
或数据
data.table
library(data.table)
setDT(df1)[, .(total = sum(diff)), .(tag, colony)]
x2 <- aggregate(x$diff, by=list(x$tag,x$colony), FUN=sum)
或等效地作为公式x2 <- aggregate(diff~tag+colony,data=x,FUN=sum)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.