簡體   English   中英

R dplyr 按組執行不同的聚合

[英]R dplyr perform different aggregation by group

我有一個dat ,如下所示:

   dat <- structure(list(cell.ID = c(329574L, 329574L, 329574L, 329574L, 
    329574L, 329574L, 329574L, 329574L, 329574L, 329574L, 329574L, 
    329574L), Year = c("2010", "2010", "2010", "2010", "2010", "2010", 
    "2010", "2010", "2010", "2010", "2010", "2010"), month_name = c("June", 
    "July", "June", "July", "June", "July", "June", "July", "June", 
    "July", "June", "July"), value = c(459.860986624053, 398.94083733151, 
    16, 23, 111.69, 453.333, 71.55, 30.38, 31.928, 30.13355, 17.587, 
    19.7938709677419), variable_name = c("ETo", "ETo", "Rday", "Rday", 
    "Rsum", "Rsum", "Thdd", "Thdd", "Tmax", "Tmax", "Tmin", "Tmin"
    ), monthID = c(6L, 7L, 6L, 7L, 6L, 7L, 6L, 7L, 6L, 7L, 6L, 7L
    )), row.names = c(NA, -12L), class = "data.frame")


library(dplyr)

dat  %>%
dplyr::group_by(Year, variable_name) %>% 
dplyr::summarise(variable = sum(value))

如果我想平均 Tmax 和 Tmin 並對變量的 rest 求和,我這樣做了

dat %>%        
dplyr::group_by(Year, variable_name) %>% 
dplyr::summarise(variable = ifelse(variable_name %in% c('Tmax', 'Tmin'), mean(value), sum(value)))

Error: Column `variable` must be length 1 (a summary value), not 2  

我該如何糾正?

另一種方法是dplyr是使用ifelse而不是ifelse

dat %>%        
  group_by(Year, variable_name) %>% 
  summarise(variable = if (variable_name[1] %in% c('Tmax', 'Tmin')) mean(value) else sum(value))

# A tibble: 6 x 3
# Groups:   Year [1]
  Year  variable_name variable
  <chr> <chr>            <dbl>
1 2010  ETo              859. 
2 2010  Rday              39  
3 2010  Rsum             565. 
4 2010  Thdd             102. 
5 2010  Tmax              31.0
6 2010  Tmin              18.7

認為問題在於ifelse在這種情況下是按行操作的,而不是在組的級別上。 如果這是正確的,那么您可以通過獲取兩個摘要統計信息然后通過變量名稱有條件地選擇您想要的一個來解決該問題,如下所示:

dat %>%        
dplyr::group_by(Year, variable_name) %>% 
dplyr::summarise(var_mean = mean(value), var_sum = sum(value)) %>%
dplyr::mutate(variable = ifelse(variable_name %in% c('Tmax', 'Tmin'), var_mean, var_sum)) %>%
dplyr::select(-var_mean, -var_sum)

結果:

# A tibble: 6 x 3
# Groups:   Year [1]
  Year  variable_name variable
  <chr> <chr>            <dbl>
1 2010  ETo              859. 
2 2010  Rday              39  
3 2010  Rsum             565. 
4 2010  Thdd             102. 
5 2010  Tmax              31.0
6 2010  Tmin              18.7

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM