繁体   English   中英

R dplyr 按组执行不同的聚合

[英]R dplyr perform different aggregation by group

我有一个dat ,如下所示:

   dat <- structure(list(cell.ID = c(329574L, 329574L, 329574L, 329574L, 
    329574L, 329574L, 329574L, 329574L, 329574L, 329574L, 329574L, 
    329574L), Year = c("2010", "2010", "2010", "2010", "2010", "2010", 
    "2010", "2010", "2010", "2010", "2010", "2010"), month_name = c("June", 
    "July", "June", "July", "June", "July", "June", "July", "June", 
    "July", "June", "July"), value = c(459.860986624053, 398.94083733151, 
    16, 23, 111.69, 453.333, 71.55, 30.38, 31.928, 30.13355, 17.587, 
    19.7938709677419), variable_name = c("ETo", "ETo", "Rday", "Rday", 
    "Rsum", "Rsum", "Thdd", "Thdd", "Tmax", "Tmax", "Tmin", "Tmin"
    ), monthID = c(6L, 7L, 6L, 7L, 6L, 7L, 6L, 7L, 6L, 7L, 6L, 7L
    )), row.names = c(NA, -12L), class = "data.frame")


library(dplyr)

dat  %>%
dplyr::group_by(Year, variable_name) %>% 
dplyr::summarise(variable = sum(value))

如果我想平均 Tmax 和 Tmin 并对变量的 rest 求和,我这样做了

dat %>%        
dplyr::group_by(Year, variable_name) %>% 
dplyr::summarise(variable = ifelse(variable_name %in% c('Tmax', 'Tmin'), mean(value), sum(value)))

Error: Column `variable` must be length 1 (a summary value), not 2  

我该如何纠正?

另一种方法是dplyr是使用ifelse而不是ifelse

dat %>%        
  group_by(Year, variable_name) %>% 
  summarise(variable = if (variable_name[1] %in% c('Tmax', 'Tmin')) mean(value) else sum(value))

# A tibble: 6 x 3
# Groups:   Year [1]
  Year  variable_name variable
  <chr> <chr>            <dbl>
1 2010  ETo              859. 
2 2010  Rday              39  
3 2010  Rsum             565. 
4 2010  Thdd             102. 
5 2010  Tmax              31.0
6 2010  Tmin              18.7

认为问题在于ifelse在这种情况下是按行操作的,而不是在组的级别上。 如果这是正确的,那么您可以通过获取两个摘要统计信息然后通过变量名称有条件地选择您想要的一个来解决该问题,如下所示:

dat %>%        
dplyr::group_by(Year, variable_name) %>% 
dplyr::summarise(var_mean = mean(value), var_sum = sum(value)) %>%
dplyr::mutate(variable = ifelse(variable_name %in% c('Tmax', 'Tmin'), var_mean, var_sum)) %>%
dplyr::select(-var_mean, -var_sum)

结果:

# A tibble: 6 x 3
# Groups:   Year [1]
  Year  variable_name variable
  <chr> <chr>            <dbl>
1 2010  ETo              859. 
2 2010  Rday              39  
3 2010  Rsum             565. 
4 2010  Thdd             102. 
5 2010  Tmax              31.0
6 2010  Tmin              18.7

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM