简体   繁体   中英

“Running” weighted average

I'm constantly adding/removing tuples to a list in Python and am interested in the weighted average (not the list itself). Since this part is computationally quite expensive compared to the rest, I want to optimise it. What's the best way of keeping track of the weighted average? I can think of two methods:

  • keeping the list and calculating the weighted average every time it gets accessed/changed (my current approach)
  • just keep track of current weighted average and the sum of all weights and change weight and current weighted average for every add/remove action

I would prefer the 2nd option, but I am worried about "floating point errors" induced by constant addition/subtraction. What's the best way of dealing with this?

Try doing it in integers? Python bignums should make a rational argument for rational numbers (sorry, It's late... really sorry actually).

It really depends on how many terms you are using and what your weighting coefficient is as to weather you will experience much floating point drift. You only get 53 bits of precision, you might not need that much.

If your weighting factor is less than 1, then your error should be bounded since you are constantly decreasing it. Let's say your weight is 0.6 (horrible, because you cannot represent that in binary). That is 0.00110011... represented as 0.0011001100110011001101 (rounded in the last bit). So any error you introduce from that rounding, will be then decreased after you multiply again. The error in the most current term will dominate.

Don't do the final division until you need to. Once again given 0.6 as your weight and 10 terms, your term weights will be 99.22903012752124 for the first term all the way down to 1 for the last term ( 0.6**-t ). Multiply your new term by 99.22... , add it to your running sum and subtract the trailing term out, then divide by 246.5725753188031 ( sum([0.6**-x for x in range(0,10)] )

If you really want to adjust for that, you can add a ULP to the term you are about to remove, but this will just underestimate intentionally, I think.

Here is an answer that retains floating point for keeping a running total - I think a weighted average requires only two running totals:

Allocate an array to store your numbers in, so that inserting a number means finding an empty space in the array and setting it to that value and deleting a number means setting its value in the array to zero and declaring that space empty - you can use a linked list of free entries to find empty entries in time O(1)

Now you need to work out the sum of an array of size N. Treat the array as a full binary tree, as in heapsort, so offset 0 is the root, 1 and 2 are its children, 3 and 4 are the children of 1, 5 and 6 are the children of 2, and so on - the children of i are at 2i+1 and 2i+2.

For each internal node, keep the sum of all entries at or below that node in the tree. Now when you modify an entry you can recalculate the sum of the values in the array by working your way from that entry up to the root of the tree, correcting the partial sums as you go - this costs you O(log N) where N is the length of the array.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM