简体   繁体   English

R-具有加权的矢量化均值

[英]R - Vectorized Mean with Weighting

I am currently able to rapidly calculate the mean of a dataset I have that is several million entries using the following code : 我目前能够使用以下代码快速计算出我拥有数百万个条目的数据集的平均值:

PosAvg = mean( curTweets$posScore[curTweets$posScore > 1])
uniqPosTweets = curTweets[ curTweets$posScore > abs(curTweets$negScore) ,]
UniqPosAvg = mean( uniqPosTweets$posScore )

However, I want to weight these, and still keep the efficiency I have be doing this in the same style as above. 但是,我想对它们进行加权,并且仍然保持与上述相同的效率。

curTweets$posScore / curTweets$negScore can take a value of 1, 2, 3, 4, 5. curTweets $ posScore / curTweets $ negScore的取值为1、2、3、4、5。

Let's say I want to give the following weights : 6,7,8,9,10 respectively. 假设我要赋予以下权重:分别为6,7,8,9,10。 I'm using these numbers to just differentiate the from the potential values of posScore. 我使用这些数字只是为了区别posScore的潜在价值。 Actual weights are calculated in my algorithm. 实际权重在我的算法中计算。

Is there a way to do this? 有没有办法做到这一点? I can't figure out how I would weight while maintaining this efficiency. 我无法弄清楚如何保持这种效率。 Am I stuck having to loop through each entry and calculate contributions individually? 我是否需要循环浏览每个条目并分别计算贡献?

Thank you! 谢谢!

foo <- seq(5)
weights <- c(1, 1, 1, 1, 100)
vectorized_weighted_mean <- sum(foo * weights) / sum(weights)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM