在 R (dplyr) 中保留單行組的同時計算分組平均值

Question

我正在嘗試計算數據集的均值 + 標准差。 我有一個組織列表，但一個組織只有一行“cpue”。 當我嘗試計算每個組織和另一個變量（學名）的分組平均值時，該組織被刪除並產生 NA。 但是，我想保留單組值，並將其放在“平均值”列中，以便我可以 plot 它（沒有 sd）。 有沒有辦法告訴 dplyr 在計算平均值時保留單行組？ 數據如下：

  l<-  df<- data.frame(organization = c("A","B", "B", "A","B", "A", "C"),
             species= c("turtle", "shark", "turtle", "bird", "turtle", "shark", "bird"),
             cpue= c(1, 2, 1, 5, 6, 1, 3))

  l2<- l %>% 
       group_by( organization, species)%>%
       summarize(mean= mean(cpue),
                 sd=sd(cpue))

任何幫助將非常感激！

Answer 1

我們可以在sd中創建一個if/else條件來檢查行數，即if n() ==1然后返回 'cpue' else計算 'cpue' 的sd

library(dplyr)
l1 <-  l %>% 
   group_by( organization, species)%>%
   summarize(mean= mean(cpue),
             sd= if(n() == 1) cpue else sd(cpue), .groups = 'drop')

-輸出

l1
# A tibble: 6 x 4
#  organization species  mean    sd
#* <chr>        <chr>   <dbl> <dbl>
#1 A            bird      5    5   
#2 A            shark     1    1   
#3 A            turtle    1    1   
#4 B            shark     2    2   
#5 B            turtle    3.5  3.54
#6 C            bird      3    3

如果條件基於分組變量“組織”的值，則通過使用cur_group()提取分組變量在if/else中創建條件

l %>% 
   group_by(organization, species) %>% 
   summarise(mean = mean(cpue),
       sd = if(cur_group()$organization == 'A') cpue else sd(cpue), 
            .groups = 'drop')

在 R (dplyr) 中保留單行組的同時計算分組平均值

問題描述

1 個解決方案

解決方案1
2 已采納 2021-04-07 23:09:22

在 R (dplyr) 中保留單行組的同時計算分組平均值

問題描述

1 個解決方案

解決方案1 2 已采納 2021-04-07 23:09:22

解決方案1
2 已采納 2021-04-07 23:09:22