使用 dplyr 按组计算加权平均值（并复制其他方法）

Question

尝试计算加权平均值时，我对dplyr的语法感到困惑。

我在这里遵循大卫的建议。 语法非常透明，因此很有吸引力，但它似乎并没有像我预期的那样工作：低于加权平均值是针对整个数据计算的，而不是按 B 变量分组。

head(df)
# A tibble: 4 × 3
      A     B     P
  <dbl> <dbl> <dbl>
1     1    10   0.4
2     2    10   0.6
3     1    20   0.2
4     2    20   0.8

library(dplyr)
df %>% group_by(B) %>%
    summarise(wm = weighted.mean(A, P))
# wm
# 1 1.7

我可以通过其他几种方式达到预期的效果。 我如何使用dplyr来复制下面的计算？

# with a slit/apply routine:
sapply(split(df, df$B), function(x) weighted.mean(x$A, x$P))
#  10  20 
# 1.6 1.8 

# with data.table
library(data.table)
setDT(df)[, .(wm = weighted.mean(A, P)), B]
#     B  wm
# 1: 10 1.6
# 2: 20 1.8

# with plyr:
library(plyr)
ddply(df, .(B), summarise, wm = weighted.mean(A, P))
#    B  wm
# 1 10 1.6
# 2 20 1.8

# with aggregate | the formula approach is mysterious
df$wm <- 1:nrow(df)
aggregate(wm ~ B, data=df, function(x) weighted.mean(df$A[x], df$P[x]))
#    B  wm
# 1 10 1.6
# 2 20 1.8
df$wm <- NULL  # no longer needed

这是玩具数据（一个tibble ，而不是标准的dataframe ）：

library(tidyverse)
df = structure(list(A = c(1, 2, 1, 2), B = c(10, 10, 20, 20), P = c(0.4, 0.6, 0.2, 0.8)), 
    row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))

这是关于使用dplyr按组计算平均值的一篇和另一篇文章，但我没有看到他们如何阐明我的问题。

Answer 1

这是加载 package plyr时发生的非常常见的事情，因为plyr::summarise可以覆盖dplyr::summarise function。只需使用dplyr::summarise 。 这是检查summarise是否输出意外结果的第一件事。

另一种方法是在使用 dplyr 之前分离plyr dplyr ：

detach("package:plyr")
library("dplyr")
df %>% group_by(B) %>%
    summarise(wm = weighted.mean(A, P))
#       B    wm
#    <dbl> <dbl>
# 1    10   1.6
# 2    20   1.8

使用 dplyr 按组计算加权平均值（并复制其他方法）

问题描述

1 个解决方案

解决方案1
4 已采纳 2022-02-19 08:15:56

使用 dplyr 按组计算加权平均值（并复制其他方法）

问题描述

1 个解决方案

解决方案1 4 已采纳 2022-02-19 08:15:56

解决方案1
4 已采纳 2022-02-19 08:15:56