R-具有加权的矢量化均值

Question

I am currently able to rapidly calculate the mean of a dataset I have that is several million entries using the following code : 我目前能够使用以下代码快速计算出我拥有数百万个条目的数据集的平均值：

PosAvg = mean( curTweets$posScore[curTweets$posScore > 1])
uniqPosTweets = curTweets[ curTweets$posScore > abs(curTweets$negScore) ,]
UniqPosAvg = mean( uniqPosTweets$posScore )

However, I want to weight these, and still keep the efficiency I have be doing this in the same style as above. 但是，我想对它们进行加权，并且仍然保持与上述相同的效率。

curTweets$posScore / curTweets$negScore can take a value of 1, 2, 3, 4, 5. curTweets $ posScore / curTweets $ negScore的取值为1、2、3、4、5。

Let's say I want to give the following weights : 6,7,8,9,10 respectively. 假设我要赋予以下权重：分别为6,7,8,9,10。 I'm using these numbers to just differentiate the from the potential values of posScore. 我使用这些数字只是为了区别posScore的潜在价值。 Actual weights are calculated in my algorithm. 实际权重在我的算法中计算。

Is there a way to do this? 有没有办法做到这一点？ I can't figure out how I would weight while maintaining this efficiency. 我无法弄清楚如何保持这种效率。 Am I stuck having to loop through each entry and calculate contributions individually? 我是否需要循环浏览每个条目并分别计算贡献？

Thank you! 谢谢！

Answer 1

foo <- seq(5)
weights <- c(1, 1, 1, 1, 100)
vectorized_weighted_mean <- sum(foo * weights) / sum(weights)

R-具有加权的矢量化均值

问题描述

1 个解决方案

解决方案1
0 2016-05-01 18:27:44

R-具有加权的矢量化均值

问题描述

1 个解决方案

解决方案1 0 2016-05-01 18:27:44

解决方案1
0 2016-05-01 18:27:44