简体   繁体   English

R dplyr 按组执行不同的聚合

[英]R dplyr perform different aggregation by group

I have a dataframe dat which looks like this:我有一个dat ,如下所示:

   dat <- structure(list(cell.ID = c(329574L, 329574L, 329574L, 329574L, 
    329574L, 329574L, 329574L, 329574L, 329574L, 329574L, 329574L, 
    329574L), Year = c("2010", "2010", "2010", "2010", "2010", "2010", 
    "2010", "2010", "2010", "2010", "2010", "2010"), month_name = c("June", 
    "July", "June", "July", "June", "July", "June", "July", "June", 
    "July", "June", "July"), value = c(459.860986624053, 398.94083733151, 
    16, 23, 111.69, 453.333, 71.55, 30.38, 31.928, 30.13355, 17.587, 
    19.7938709677419), variable_name = c("ETo", "ETo", "Rday", "Rday", 
    "Rsum", "Rsum", "Thdd", "Thdd", "Tmax", "Tmax", "Tmin", "Tmin"
    ), monthID = c(6L, 7L, 6L, 7L, 6L, 7L, 6L, 7L, 6L, 7L, 6L, 7L
    )), row.names = c(NA, -12L), class = "data.frame")


library(dplyr)

dat  %>%
dplyr::group_by(Year, variable_name) %>% 
dplyr::summarise(variable = sum(value))

If I want to average the Tmax and Tmin and sum the rest of the variables, I did this如果我想平均 Tmax 和 Tmin 并对变量的 rest 求和,我这样做了

dat %>%        
dplyr::group_by(Year, variable_name) %>% 
dplyr::summarise(variable = ifelse(variable_name %in% c('Tmax', 'Tmin'), mean(value), sum(value)))

Error: Column `variable` must be length 1 (a summary value), not 2  

How do I correct this?我该如何纠正?

Another way to do this is dplyr is to use if and else instead of ifelse :另一种方法是dplyr是使用ifelse而不是ifelse

dat %>%        
  group_by(Year, variable_name) %>% 
  summarise(variable = if (variable_name[1] %in% c('Tmax', 'Tmin')) mean(value) else sum(value))

# A tibble: 6 x 3
# Groups:   Year [1]
  Year  variable_name variable
  <chr> <chr>            <dbl>
1 2010  ETo              859. 
2 2010  Rday              39  
3 2010  Rsum             565. 
4 2010  Thdd             102. 
5 2010  Tmax              31.0
6 2010  Tmin              18.7

I think the problem is that ifelse in this context is operating row-wise, not at the level of the group.认为问题在于ifelse在这种情况下是按行操作的,而不是在组的级别上。 If that's right, then you could work around the problem by getting both summary statistics and then conditionally selecting the one you want by variable name, like this:如果这是正确的,那么您可以通过获取两个摘要统计信息然后通过变量名称有条件地选择您想要的一个来解决该问题,如下所示:

dat %>%        
dplyr::group_by(Year, variable_name) %>% 
dplyr::summarise(var_mean = mean(value), var_sum = sum(value)) %>%
dplyr::mutate(variable = ifelse(variable_name %in% c('Tmax', 'Tmin'), var_mean, var_sum)) %>%
dplyr::select(-var_mean, -var_sum)

Result:结果:

# A tibble: 6 x 3
# Groups:   Year [1]
  Year  variable_name variable
  <chr> <chr>            <dbl>
1 2010  ETo              859. 
2 2010  Rday              39  
3 2010  Rsum             565. 
4 2010  Thdd             102. 
5 2010  Tmax              31.0
6 2010  Tmin              18.7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM