简体   繁体   English

使用dplyr计算加权平均值/中位数与plyr一样

[英]Compute weighted mean/median with dplyr as with plyr

I'm trying to do something I already have with ddply in dplyr 我正在尝试用ddply中的dplyr做一些事情

This is what works: 这是有效的:

library(plyr)
library(dplyr)
library(matrixStats)


mtcars2 = tbl_df(mtcars) %>% 
  mutate(car = rownames(mtcars))

# compute the weighted mean (I use cyl just to provide an example)
ddply(mtcars2, .(car), summarise, FUN = matrixStats::weightedMean(mpg, w = cyl, na.rm = TRUE))

# compute the weighted median
ddply(mtcars2, .(car), summarise, FUN = matrixStats::weightedMedian(mpg, w = cyl, na.rm = TRUE))

the output of that is 那个输出是

> ddply(mtcars2, .(car), summarise, FUN = matrixStats::weightedMean(mpg, w = cyl, na.rm = TRUE))
                   car  FUN
1          AMC Javelin 15.2
2   Cadillac Fleetwood 10.4
3           Camaro Z28 13.3
4    Chrysler Imperial 14.7
5           Datsun 710 22.8
6     Dodge Challenger 15.5
7           Duster 360 14.3
8         Ferrari Dino 19.7
9             Fiat 128 32.4
10           Fiat X1-9 27.3
11      Ford Pantera L 15.8
12         Honda Civic 30.4
13      Hornet 4 Drive 21.4
14   Hornet Sportabout 18.7
15 Lincoln Continental 10.4
16        Lotus Europa 30.4
17       Maserati Bora 15.0
18           Mazda RX4 21.0
19       Mazda RX4 Wag 21.0
20            Merc 230 22.8
21           Merc 240D 24.4
22            Merc 280 19.2
23           Merc 280C 17.8
24          Merc 450SE 16.4
25          Merc 450SL 17.3
26         Merc 450SLC 15.2
27    Pontiac Firebird 19.2
28       Porsche 914-2 26.0
29      Toyota Corolla 33.9
30       Toyota Corona 21.5
31             Valiant 18.1
32          Volvo 142E 21.4

etc... which is ok 等等......这没关系

I need to so something like this (this won't work because is not correct): 我需要这样的东西(这不会起作用因为不正确):

mtcars3 = tbl_df(mtcars) %>% 
  mutate(car = rownames(mtcars)) %>% 
  mutate(weighted_mean_mpg = ddply(mtcars, .(car), summarise, FUN = matrixStats::weightedMean(mpg, w = cyl, na.rm = TRUE))) %>% 
  mutate(weighted_median_mpg = ddply(mtcars, .(car), summarise, FUN = matrixStats::weightedMedian(mpg, w = cyl, na.rm = TRUE)))

Or in other words pass two variables inside a dplyr statement (both x and a vector of weights w ) 或者换句话说,在dplyr语句中传递两个变量( x和权重w的向量)

Many thanks in advance !! 提前谢谢了 !!

x <- as_tibble(mtcars) %>% rownames_to_column(var = 'car')

x %>% group_by(car) %>% summarise(m = mean(mpg, wt = cyl)) %>% knitr::kable()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM