I am trying to calculate an aggregation with weighted mean (AverageViewingTime per number of Views) on a table called DFA_CMO. DFA_CMO has 5 dimensions, and Site among them.
Here is the code :
ddply(DFA_CMO,.(Site),summarize, wm = weighted.mean(DFA_CMO$AverageViewingTime, DFA_CMO$Views, ,na.rm=TRUE))
But the result is disapointing, as it shows the same value for each fields :
Site wm
1 Advideum 21.17633
2 bbc.com 21.17633
3 Boursorama 21.17633
4 Canal Plus 21.17633
5 CNN Network 21.17633
6 EuronewsFR 21.17633
7 invitemedo.com 21.17633
8 Lfddfdse 21.17633
9 Le Monde 21.17633
10 Les Echos 1 21.17633
11 lopinion.fr 21.17633
12 TF1.fr 21.17633
13 ViadeoFR 21.17633
14 WSJ UK - IBM PE 21.17633
It seems that the average on everything is displayed here, whereas is should be different per site. Any idea how to get the right values ?
Don't pass DFA_CMO$<var_name>
in the call to ddply
. Just pass the variable names themselves.
ddply(DFA_CMO,.(Site),summarize,
wm = weighted.mean(AverageViewingTime, views, ,na.rm=TRUE))
The reason is that by giving the data frame name, you are effectively passing fixed vectors to the weighted mean function, namely the values for all rows in your data frame. If you pass only the column names, ddply
will evaluate them in the context of those row subsets corresponding to each group.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.