简体   繁体   English

"如何使用 dplyr 计算两个分组变量的加权平均值"

[英]How to use dplyr to calculate a weighted mean of two grouped variables

I know this must be super easy, but I'm having trouble finding the right dplyr commands to do this.我知道这一定非常简单,但我很难找到正确的 dplyr 命令来执行此操作。 Let's say I want to group a dataset by two variables, and then summarize the count for each row.假设我想按两个变量对数据集进行分组,然后汇总每一行的计数。 For this we simply have:为此,我们只需:

mtcars %>% group_by(cyl, mpg) %>% summarize(Count = n())

If I have understood you correctly, you need weighted.mean 如果我理解正确,你需要weighted.mean

library(dplyr)
mtcars %>% 
   group_by(cyl, mpg) %>% 
   summarize(Count = n()) %>%
   group_by(cyl) %>%
   summarise(avg_mpg = weighted.mean(mpg, Count))

# A tibble: 3 x 2
#    cyl   avg_mpg
#  <dbl>   <dbl>
#1  4.00    26.7
#2  6.00    19.7
#3  8.00    15.1

which is equivalent to 这相当于

mtcars %>% 
  group_by(cyl, mpg) %>% 
  summarize(Count = n()) %>%
  group_by(cyl) %>%
  summarise(avg_mpg = sum(mpg * Count)/sum(Count))

You are effectively performing a simple mean because the weights are the grouping variable:您正在有效地执行简单的平均值,因为权重是分组变量:

library(dplyr)
options(pillar.sigfig=10) # To check they are identical results
    
mtcars %>%
  group_by(cyl) %>%
  summarise(avg_mpg = mean(mpg))

Output:输出:

The result is identical to the ones proposed above:结果与上面提出的结果相同:

# A tibble: 3 x 2
    cyl     avg_mpg
  <dbl>       <dbl>
1     4 26.66363636
2     6 19.74285714
3     8 15.1  

If you need a weighted mean based on another variable:如果您需要基于另一个变量的加权平均值:

mtcars %>%
  group_by(cyl) %>%
  summarise(avg_mpg = weighted.mean(mpg, disp))

# A tibble: 3 x 2
    cyl     avg_mpg
  <dbl>       <dbl>
1     4 25.81985300
2     6 19.77197631
3     8 14.86285148

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM