Not sure how to proceed with this. I have a list of numbers (a list of lists of numbers to be exact), but these number have an ambiguity: x, x+1 and x-1 are exactly the same thing for me. However, I'd like to minimize the variance of the list by changing the elements. Here's what i thought so far (with a sample list that I know it doesn't work):
import numpy as np
from scipy import stats
lst = [0.474, 0.122, 0.0867, 0.896, 0.979]
def min_var(lst):
mode = np.mean(lst)
var = np.var(lst)
result = []
for item in list(lst):
if item < mean: # not sure this is a good test
new_item = item + 1
elif item > mean:
new_item = item - 1
else:
new_item = item
new_list = [new_item if x==item else x for x in lst]
new_var = np.var(new_list)
if new_var < var:
var = new_var
lst = new_list
return lst
What the function does is add 1 to the 3rd element. However, the minimum variance occurs when you subtract 1 from the 4th and 5th. This happens because I'm minimizing the variance after each item, not allowing for multiple changes. How could I implement multiple changes, preferably without looking at all possible solutions (3**n if I'm not mistaken)? Thanks a lot
You can consider this as a problem of finding the delta
that minimizes var((x + delta) % 1)
where x
your array of values. Then you add and subtract integers from your values until they lie in the range delta - 1 <= x[i] < delta
. This isn't a continuous function of delta
, so you can't use solvers like in scipy.optimize
. But we can use the information that the value of var((x + delta) % 1)
only changes at each value of x, which means we only need to test each value in x
as a possible delta
, and find the one that minimizes the variance.
import numpy as np
x = np.array([0.474, 0.122, 0.0867, 0.896, 0.979])
# find the value of delta
delta = x[0]
min_var = np.var((x - delta) % 1)
for val in x:
current_var = np.var((x - val) % 1)
if current_var < min_var:
min_var = current_var
delta = val
print(delta)
# use `delta` to subtract and add the right integer from each value
# we want values in the range delta - 1 <= val < delta
for i, val in enumerate(x):
while val >= delta:
val -= 1.
while val < delta - 1.:
val += 1.
x[i] = val
print(x)
For this example, it finds your desired solution of [ 0.474 0.122 0.0867 -0.104 -0.021 ]
with a variance of 0.0392
.
To avoid calculating the new var each time (O(n²)), you can see that when you affect an item from x
to x+u
, the var is affected like u*(u/2+xmu/n)
.
So here is a quasi-linear time solution:
l=np.array([0.474, 0.122, 0.0867, 0.896, 0.979])
l.sort()
n=len(l)
m=np.mean(l)
print(l,np.var(l))
u=1 # increase little terms
for i in range(n):
if u*(u/2+l[i]-m-u/n) < 0:
l[i]= l[i] + u
m = m+u/n # mean evolution
else: u = -1 # decrease big terms
print(l,np.var(l))
and the run :
[ 0.0867 0.122 0.474 0.896 0.979 ] 0.1399936064
[ 1.0867 1.122 1.474 0.896 0.979 ] 0.0392256064
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.