简体   繁体   中英

R ddply and weighted mean

I am trying to calculate an aggregation with weighted mean (AverageViewingTime per number of Views) on a table called DFA_CMO. DFA_CMO has 5 dimensions, and Site among them.

Here is the code :

ddply(DFA_CMO,.(Site),summarize, wm = weighted.mean(DFA_CMO$AverageViewingTime, DFA_CMO$Views, ,na.rm=TRUE))

But the result is disapointing, as it shows the same value for each fields :

              Site       wm
1         Advideum 21.17633
2          bbc.com 21.17633
3       Boursorama 21.17633
4       Canal Plus 21.17633
5     CNN  Network 21.17633
6       EuronewsFR 21.17633
7  invitemedo.com 21.17633
8         Lfddfdse 21.17633
9         Le Monde 21.17633
10     Les Echos 1 21.17633
11     lopinion.fr 21.17633
12          TF1.fr 21.17633
13        ViadeoFR 21.17633
14 WSJ UK - IBM PE 21.17633

It seems that the average on everything is displayed here, whereas is should be different per site. Any idea how to get the right values ?

Don't pass DFA_CMO$<var_name> in the call to ddply . Just pass the variable names themselves.

ddply(DFA_CMO,.(Site),summarize,
      wm = weighted.mean(AverageViewingTime, views, ,na.rm=TRUE))

The reason is that by giving the data frame name, you are effectively passing fixed vectors to the weighted mean function, namely the values for all rows in your data frame. If you pass only the column names, ddply will evaluate them in the context of those row subsets corresponding to each group.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM