简体   繁体   中英

dplyr - summarise_each grouping by factor equality against multiple columns

I want to summarise each sum of column, group by A or B of gg

> gg
  A  B a1 a2 a3
1 c2 c1  1  5  9
2 c1 c3  2  6 10
3 c4 c2  3  7 11
4 c3 c2  4  8 12

to get

> test 
   AB a1 a2 a3
1  c1  3 11 19
2  c2  8 20 32
3  c3  6 14 22
4  c4  3  7 11

I know how to do it for column A:

test<-gg %>%
  group_by(A) %>%
  summarise_each(funs(sum(., na.rm=TRUE)),a1:a3)

could you help me to do it for both A and B ?

thanks for your help

Consider changing the shape of your dataset to a longer format. For example, you can use gather from package tidyr to gather A and B into a single column before summing.

Here is how you could use gather with your dataset, showing the longer output dataset with the new AB column.

library(tidyr)
gather(gg, group, AB, A:B)

  a1 a2 a3 group AB
1  1  5  9     A c2
2  2  6 10     A c1
3  3  7 11     A c4
4  4  8 12     A c3
5  1  5  9     B c1
6  2  6 10     B c3
7  3  7 11     B c2
8  4  8 12     B c2

You can add the gather step into your code chain before grouping. Then group_by your new AB variable and use the rest of your code as you have it.

library(dplyr)
gg %>%
    gather(group, AB, A:B) %>%
    group_by(AB) %>%
    summarise_each(funs(sum(., na.rm = TRUE)), a1:a3)

Source: local data frame [4 x 4]

  AB a1 a2 a3
1 c1  3 11 19
2 c2  8 20 32
3 c3  6 14 22
4 c4  3  7 11

Is there a reason you need to use dplyr ?

AB <- unique(dat$A, dat$B)
data.frame(AB, do.call("rbind", lapply(AB, function(x) {
  colSums(dat[dat$A==x | dat$B==x, c("a1", "a2", "a3")])
})))

##   AB a1 a2 a3
## 1 c2  8 20 32
## 2 c1  3 11 19
## 3 c4  3  7 11
## 4 c3  6 14 22

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM