简体   繁体   中英

Compute weighted mean/median with dplyr as with plyr

I'm trying to do something I already have with ddply in dplyr

This is what works:

library(plyr)
library(dplyr)
library(matrixStats)


mtcars2 = tbl_df(mtcars) %>% 
  mutate(car = rownames(mtcars))

# compute the weighted mean (I use cyl just to provide an example)
ddply(mtcars2, .(car), summarise, FUN = matrixStats::weightedMean(mpg, w = cyl, na.rm = TRUE))

# compute the weighted median
ddply(mtcars2, .(car), summarise, FUN = matrixStats::weightedMedian(mpg, w = cyl, na.rm = TRUE))

the output of that is

> ddply(mtcars2, .(car), summarise, FUN = matrixStats::weightedMean(mpg, w = cyl, na.rm = TRUE))
                   car  FUN
1          AMC Javelin 15.2
2   Cadillac Fleetwood 10.4
3           Camaro Z28 13.3
4    Chrysler Imperial 14.7
5           Datsun 710 22.8
6     Dodge Challenger 15.5
7           Duster 360 14.3
8         Ferrari Dino 19.7
9             Fiat 128 32.4
10           Fiat X1-9 27.3
11      Ford Pantera L 15.8
12         Honda Civic 30.4
13      Hornet 4 Drive 21.4
14   Hornet Sportabout 18.7
15 Lincoln Continental 10.4
16        Lotus Europa 30.4
17       Maserati Bora 15.0
18           Mazda RX4 21.0
19       Mazda RX4 Wag 21.0
20            Merc 230 22.8
21           Merc 240D 24.4
22            Merc 280 19.2
23           Merc 280C 17.8
24          Merc 450SE 16.4
25          Merc 450SL 17.3
26         Merc 450SLC 15.2
27    Pontiac Firebird 19.2
28       Porsche 914-2 26.0
29      Toyota Corolla 33.9
30       Toyota Corona 21.5
31             Valiant 18.1
32          Volvo 142E 21.4

etc... which is ok

I need to so something like this (this won't work because is not correct):

mtcars3 = tbl_df(mtcars) %>% 
  mutate(car = rownames(mtcars)) %>% 
  mutate(weighted_mean_mpg = ddply(mtcars, .(car), summarise, FUN = matrixStats::weightedMean(mpg, w = cyl, na.rm = TRUE))) %>% 
  mutate(weighted_median_mpg = ddply(mtcars, .(car), summarise, FUN = matrixStats::weightedMedian(mpg, w = cyl, na.rm = TRUE)))

Or in other words pass two variables inside a dplyr statement (both x and a vector of weights w )

Many thanks in advance !!

x <- as_tibble(mtcars) %>% rownames_to_column(var = 'car')

x %>% group_by(car) %>% summarise(m = mean(mpg, wt = cyl)) %>% knitr::kable()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM