基于R中的列汇总数据

Question

I have a data set that looks like this (actual data is 10K by 5K so I really need a shortcut):我有一个看起来像这样的数据集（实际数据是 10K x 5K，所以我真的需要一个快捷方式）：

Cluster簇	Item1项目 1	Item2项目 2	Item 3第 3 项
1 1	1 1	2 2	2 2
1 1	3 3	1 1	1 1
1 1	1 1	3 3	0 0
2 2	3 3	2 2	0 0
2 2	0 0	0 0	2 2
2 2	4 4	2 2	2 2
3 3	0 0	1 1	1 1
3 3	1 1	1 1	2 2

I want to add the columns of each data set by cluster so it will look I this:我想按集群添加每个数据集的列，所以它看起来像这样：

Cluster簇	Item1项目 1	Item2项目 2	Item 3第 3 项
1 1	5 5	6 6	3 3
2 2	7 7	4 4	4 4
3 3	1 1	2 2	3 3

I want to sum them by a certain column.我想按某一列对它们求和。

Answer 1

You can use aggregate ( dat is the name of your data frame):您可以使用aggregate （ dat是您的数据框的名称）：

aggregate(dat[-1], dat["Cluster"], sum)

#   Cluster Item1 Item2 Item3
# 1       1     5     6     3
# 2       2     7     4     4
# 3       3     1     2     3

With data.table :使用data.table ：

library(data.table)
setDT(dat)[ , lapply(.SD, sum), by = Cluster]
#    Cluster Item1 Item2 Item3
# 1:       1     5     6     3
# 2:       2     7     4     4
# 3:       3     1     2     3

With dplyr :使用dplyr ：

dat %>%
  group_by(Cluster) %>%
  summarise_each(funs(sum))
#   Cluster Item1 Item2 Item3
# 1       1     5     6     3
# 2       2     7     4     4
# 3       3     1     2     3

Answer 2

thanks for your answer, I also used this good and it worked perfectly:谢谢你的回答，我也用过这个好用的，效果很好：

 aggregate(. ~ Cluster, data=dat, FUN=sum)



#   Cluster Item1 Item2 Item3
# 1       1     5     6     3
# 2       2     7     4     4
# 3       3     1     2     3

Answer 3

Try:尝试：

> sapply(ddf[-1], function(x) tapply(x,ddf$Cluster,sum))
  Item1 Item2 Item3
1     5     6     3
2     7     4     4
3     1     2     3

Answer 4

If you want to sum all varibales except that of grouping, use across in dplyr如果要总结所有varibales除了分组，利用across在dplyr

df <- read.table(text = "Cluster    Item1   Item2   Item3
1   1   2   2
1   3   1   1
1   1   3   0
2   3   2   0
2   0   0   2
2   4   2   2
3   0   1   1
3   1   1   2", header = T)

df %>% group_by(Cluster) %>% summarise(across(everything(), ~sum(.)))

# A tibble: 3 x 4
  Cluster Item1 Item2 Item3
    <int> <int> <int> <int>
1       1     5     6     3
2       2     7     4     4
3       3     1     2     3

基于R中的列汇总数据

问题描述

4 个解决方案

解决方案1
5 2014-11-08 12:22:38

解决方案2
1 2014-11-08 12:27:55

解决方案3
0 2014-11-08 13:35:28

解决方案4
0 2021-04-05 09:14:45

Cluster簇	Item1项目 1	Item2项目 2	Item 3第 3 项
1 1	1 1	2 2	2 2
1 1	3 3	1 1	1 1
1 1	1 1	3 3	0 0
2 2	3 3	2 2	0 0
2 2	0 0	0 0	2 2
2 2	4 4	2 2	2 2
3 3	0 0	1 1	1 1
3 3	1 1	1 1	2 2

Cluster簇	Item1项目 1	Item2项目 2	Item 3第 3 项
1 1	1 1	2 2	2 2
1 1	3 3	1 1	1 1
1 1	1 1	3 3	0 0
2 2	3 3	2 2	0 0
2 2	0 0	0 0	2 2
2 2	4 4	2 2	2 2
3 3	0 0	1 1	1 1
3 3	1 1	1 1	2 2

基于R中的列汇总数据

问题描述

4 个解决方案

解决方案1 5 2014-11-08 12:22:38

解决方案2 1 2014-11-08 12:27:55

解决方案3 0 2014-11-08 13:35:28

解决方案4 0 2021-04-05 09:14:45

解决方案1
5 2014-11-08 12:22:38

解决方案2
1 2014-11-08 12:27:55

解决方案3
0 2014-11-08 13:35:28

解决方案4
0 2021-04-05 09:14:45

Cluster簇	Item1项目 1	Item2项目 2	Item 3第 3 项
1 1	1 1	2 2	2 2
1 1	3 3	1 1	1 1
1 1	1 1	3 3	0 0
2 2	3 3	2 2	0 0
2 2	0 0	0 0	2 2
2 2	4 4	2 2	2 2
3 3	0 0	1 1	1 1
3 3	1 1	1 1	2 2