简体繁体 English

“运行”加权平均值

[英]“Running” weighted average

原文 2015-01-28 05:32:35 7 2 python/ algorithm/ list/ moving-average

I'm constantly adding/removing tuples to a list in Python and am interested in the weighted average (not the list itself). 我一直在用Python在列表中添加/删除元组，并且对加权平均值（而不是列表本身）感兴趣。 Since this part is computationally quite expensive compared to the rest, I want to optimise it. 由于这部分与其他部分相比在计算上非常昂贵，因此我想对其进行优化。 What's the best way of keeping track of the weighted average? 跟踪加权平均值的最佳方法是什么？ I can think of two methods: 我可以想到两种方法：

keeping the list and calculating the weighted average every time it gets accessed/changed (my current approach) 保留列表并在每次访问/更改列表时计算加权平均值（我当前的方法）
just keep track of current weighted average and the sum of all weights and change weight and current weighted average for every add/remove action 只需跟踪当前加权平均值和所有权重的总和，并为每个添加/删除操作更改权重和当前加权平均值

I would prefer the 2nd option, but I am worried about "floating point errors" induced by constant addition/subtraction. 我更喜欢第二种选择，但是我担心由于不断的加/减引起的“浮点错误”。 What's the best way of dealing with this? 处理此问题的最佳方法是什么？

2 个解决方案

Try doing it in integers? 尝试以整数进行操作吗？ Python bignums should make a rational argument for rational numbers (sorry, It's late... really sorry actually). Python bignums应该为有理数做一个有理的参数（对不起，太晚了，实际上真的很抱歉）。

It really depends on how many terms you are using and what your weighting coefficient is as to weather you will experience much floating point drift. 这实际上取决于您使用多少个术语以及您的加权系数对天气的影响，您将遇到很多浮点漂移。 You only get 53 bits of precision, you might not need that much. 您只能得到53位精度，可能不需要那么多精度。

If your weighting factor is less than 1, then your error should be bounded since you are constantly decreasing it. 如果您的加权因子小于1，则由于您一直在减少它，因此应该限制错误。 Let's say your weight is 0.6 (horrible, because you cannot represent that in binary). 假设您的体重为0.6 （太糟糕了，因为您不能用二进制表示）。 That is 0.00110011... represented as 0.0011001100110011001101 (rounded in the last bit). 即0.00110011...表示为0.0011001100110011001101 （在最后一位舍入）。 So any error you introduce from that rounding, will be then decreased after you multiply again. 因此，您在四舍五入过程中引入的任何误差都会在您再次相乘后减少。 The error in the most current term will dominate. 最新术语中的错误将占主导地位。

Don't do the final division until you need to. 在需要之前不要进行最后的划分。 Once again given 0.6 as your weight and 10 terms, your term weights will be 99.22903012752124 for the first term all the way down to 1 for the last term ( 0.6**-t ). 再次给定0.6作为权重和10个学期，则第一个学期的学期权重将99.22903012752124为99.22903012752124 ，最后一个学期的权重将一直降为1（ 0.6**-t ）。 Multiply your new term by 99.22... , add it to your running sum and subtract the trailing term out, then divide by 246.5725753188031 ( sum([0.6**-x for x in range(0,10)] ) 将新项乘以99.22... ，将其与您的运行总和相加，然后减去尾项，然后除以246.5725753188031（ sum([0.6**-x for x in range(0,10)] ）

If you really want to adjust for that, you can add a ULP to the term you are about to remove, but this will just underestimate intentionally, I think. 如果您确实要对此进行调整，可以在要删除的术语中添加一个ULP，但是我认为这只是有意低估了。

Here is an answer that retains floating point for keeping a running total - I think a weighted average requires only two running totals: 这是一个保留浮点数以保持运行总计的答案-我认为加权平均值仅需要两个运行总计：

Allocate an array to store your numbers in, so that inserting a number means finding an empty space in the array and setting it to that value and deleting a number means setting its value in the array to zero and declaring that space empty - you can use a linked list of free entries to find empty entries in time O(1) 分配一个数组来存储您的数字，因此插入数字意味着在数组中找到一个空白空间并将其设置为该值，删除一个数字意味着将其在数组中的值设置为零并声明该空间为空-您可以使用空闲条目的链接列表，以在时间O（1）中查找空条目

Now you need to work out the sum of an array of size N. Treat the array as a full binary tree, as in heapsort, so offset 0 is the root, 1 and 2 are its children, 3 and 4 are the children of 1, 5 and 6 are the children of 2, and so on - the children of i are at 2i+1 and 2i+2. 现在，您需要计算大小为N的数组的总和。将该数组视为完整的二叉树，就像在heapsort中一样，所以offset 0是根，1和2是其子代，3和4是1的子代，5和6是2的孩子，依此类推-i的孩子在2i + 1和2i + 2。

For each internal node, keep the sum of all entries at or below that node in the tree. 对于每个内部节点，将所有条目的总和保持在树中该节点处或节点以下。 Now when you modify an entry you can recalculate the sum of the values in the array by working your way from that entry up to the root of the tree, correcting the partial sums as you go - this costs you O(log N) where N is the length of the array. 现在，当您修改条目时，可以通过从条目到树的根的方式重新计算数组中值的总和，并在进行过程中更正部分和-这将花费O（log N），其中N是数组的长度。