"如何使用 dplyr 计算两个分组变量的加权平均值"

Question

I know this must be super easy, but I'm having trouble finding the right dplyr commands to do this.我知道这一定非常简单，但我很难找到正确的 dplyr 命令来执行此操作。 Let's say I want to group a dataset by two variables, and then summarize the count for each row.假设我想按两个变量对数据集进行分组，然后汇总每一行的计数。 For this we simply have:为此，我们只需：

mtcars %>% group_by(cyl, mpg) %>% summarize(Count = n())

Answer 1

If I have understood you correctly, you need weighted.mean 如果我理解正确，你需要weighted.mean

library(dplyr)
mtcars %>% 
   group_by(cyl, mpg) %>% 
   summarize(Count = n()) %>%
   group_by(cyl) %>%
   summarise(avg_mpg = weighted.mean(mpg, Count))

# A tibble: 3 x 2
#    cyl   avg_mpg
#  <dbl>   <dbl>
#1  4.00    26.7
#2  6.00    19.7
#3  8.00    15.1

which is equivalent to 这相当于

mtcars %>% 
  group_by(cyl, mpg) %>% 
  summarize(Count = n()) %>%
  group_by(cyl) %>%
  summarise(avg_mpg = sum(mpg * Count)/sum(Count))

Answer 2

You are effectively performing a simple mean because the weights are the grouping variable:您正在有效地执行简单的平均值，因为权重是分组变量：

library(dplyr)
options(pillar.sigfig=10) # To check they are identical results
    
mtcars %>%
  group_by(cyl) %>%
  summarise(avg_mpg = mean(mpg))

Output:输出：

The result is identical to the ones proposed above:结果与上面提出的结果相同：

# A tibble: 3 x 2
    cyl     avg_mpg
  <dbl>       <dbl>
1     4 26.66363636
2     6 19.74285714
3     8 15.1

If you need a weighted mean based on another variable:如果您需要基于另一个变量的加权平均值：

mtcars %>%
  group_by(cyl) %>%
  summarise(avg_mpg = weighted.mean(mpg, disp))

# A tibble: 3 x 2
    cyl     avg_mpg
  <dbl>       <dbl>
1     4 25.81985300
2     6 19.77197631
3     8 14.86285148

"如何使用 dplyr 计算两个分组变量的加权平均值"

问题描述

2 个解决方案

解决方案1
4 2018-04-24 01:20:27

解决方案2
0 2022-02-04 10:24:39

"如何使用 dplyr 计算两个分组变量的加权平均值"

问题描述

2 个解决方案

解决方案1 4 2018-04-24 01:20:27

解决方案2 0 2022-02-04 10:24:39

解决方案1
4 2018-04-24 01:20:27

解决方案2
0 2022-02-04 10:24:39