简体   繁体   English

按另一列的值汇总多列

[英]Summarise multiple columns by the value of another column

say I have a dataframe like:说我有一个dataframe像:

 
country   tot.subcat cat                region_ar6_5               region_ar6_10              region_ar6_22 region_ar6_dev
  ABW     2.713934   2 Latin America and Caribbean Latin America and Caribbean                  Caribbean     developing
  ABW   955.775809   1 Latin America and Caribbean Latin America and Caribbean                  Caribbean     developing
  AFG  9508.812918   1 Asia and Developing Pacific               Southern Asia      Rest of Southern Asia            ldc
  AFG    47.666700   3 Asia and Developing Pacific               Southern Asia      Rest of Southern Asia            ldc
  AFG   236.350040   2 Asia and Developing Pacific               Southern Asia      Rest of Southern Asia            ldc
  AGO 30832.514020   1      Africa and Middle East                      Africa Southern and middle Africa            ldc

and I want to calculate the sum of tot.subcat for each group defined by cat and by the regions region_ar6_5, region_ar6_10,region_ar6_22, region_ar6_dev .我想计算由cat和区域region_ar6_5, region_ar6_10,region_ar6_22, region_ar6_dev tot.subcat的每个组的总和。 The result should be the same as when calculating each summation independently结果应与独立计算每个总和时相同

dataframe %>% group_by(cat,region_ar6_5) %>% summarise(sum_region_ar6_5=sum(tot.subcat))
dataframe %>% group_by(cat,region_ar6_10) %>% summarise(sum_region_ar6_10=sum(tot.subcat))
dataframe %>% group_by(cat,region_ar6_22) %>% summarise(sum_region_ar6_22=sum(tot.subcat))
dataframe %>% group_by(cat,region_ar6_dev) %>% summarise(sum_region_ar6_dev=sum(tot.subcat))

but I was looking for a more efficient solution.但我一直在寻找更有效的解决方案。 I tried with group_by_at(vars(matches('^region')), cat) %>% summarise(across(everything(), list(sum(tot.subcat)))) but doesn't work.我尝试使用group_by_at(vars(matches('^region')), cat) %>% summarise(across(everything(), list(sum(tot.subcat))))但不起作用。 the output of dput for the first 200 lines is here .前 200 行的 dput 的 output 在这里 any help appreciated任何帮助表示赞赏

As we are using different grouping variables, an option is to map over those variables由于我们使用不同的分组变量,一个选项是map超过这些变量

library(purrr)
library(dplyr)
library(stringr)
region_vars <- grep('region', names(dataframe), value = TRUE)
lst_out <- map(region_vars, ~ dataframe %>%
                  group_by(across(all_of(c('cat', .x)))) %>%
                  summarise(!! str_c("sum_", .x) := 
         sum(tot.subcat, na.rm = TRUE), .groups = 'drop'))

The output is a list of summarised output. output 是汇总的 output 的list If we want a single dataset output, the list can be reduce d to a single dataset by joining如果我们想要单个数据集 output,可以通过加入将list reduce为单个数据集

lst_out %>%
     reduce(full_join, by = 'cat')

data数据

dataframe <- structure(list(country = c("ABW", "ABW", "AFG", "AFG", "AFG", 
"AGO"), tot.subcat = c(2.713934, 955.775809, 9508.812918, 47.6667, 
236.35004, 30832.51402), cat = c(2L, 1L, 1L, 3L, 2L, 1L), 
region_ar6_5 = c("Latin America and Caribbean", 
"Latin America and Caribbean", "Asia and Developing Pacific", 
"Asia and Developing Pacific", "Asia and Developing Pacific", 
"Africa and Middle East"), region_ar6_10 = c("Latin America and Caribbean", 
"Latin America and Caribbean", "Southern Asia", "Southern Asia", 
"Southern Asia", "Africa"), region_ar6_22 = c("Caribbean", "Caribbean", 
"Rest of Southern Asia", "Rest of Southern Asia", "Rest of Southern Asia", 
"Southern and middle Africa"), region_ar6_dev = c("developing", 
"developing", "ldc", "ldc", "ldc", "ldc")),
class = "data.frame", row.names = c(NA, 
-6L))

Try something like this using reshaping:使用重塑尝试这样的事情:

library(tidyverse)
#Code
new <- dataframe %>% select(c(cat,starts_with('region'),tot.subcat)) %>%
  pivot_longer(-c(cat,tot.subcat)) %>%
  group_by(cat,name,value) %>%
  summarise(Total=sum(tot.subcat,na.rm=T)) %>%
  pivot_wider(names_from = name,values_from=Total,names_prefix = 'Total_')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM