Applying a function across a numpy array gives different answers in different runs

Question

I've been experimenting with vectorizing a function of mine and have hit a strange bug in my code that I haven't been able to figure out.

Here's the numpy array in question ( https://filebin.net/c14dcklwakrv1hw8 )

import numpy as np
example = np.load("example_array.npy") #Shape (2, 5, 5)

The problem I was trying to solve was to normalize the values in each row such that they summed to 1, except of course rows that are entirely 0's. Since numpy divide has an option to skip 0's when dividing the function I used was

f = lambda x: np.divide(x, np.sum(x, axis=1)[:, np.newaxis], where=np.sum(x, axis=1)[:, np.newaxis]!=0)

What happens however is that the value of f(example[1]) changes depending if example[1] or example[0] is run in the python terminal before it. So if you run example[0] then do f(example[1]) the last row of example[0] replaces the first row of the answer.

The execution of it can seen here https://imgur.com/lzyHV8n

Python version - 3.6.6, numpy - 1.15.3

Edit: - Adding 1 to all elements of the matrix and repeating the same operation without the where condition in np.divide works without issue. I guess that's the source of the error but I don't know why it occurs

Answer 1

The problem in your function lies in the where call. numpy.divide() will execute the underlying ufunc (the actual function call being vectorized in the numpy.divide call) ONLY at places where your where statement is evaluated at true.

At the other places, it puts whatever it has in memory to fill-in the array it creates. In order to have a good output, you need to use the out argument in the np.divide function (see the doc here : https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.divide.html ). An example implementation using a defined function is below (your initial function is also there for reference) :

import numpy as np
e = np.load("example_array.npy")

def normalize_badversion(x):
    idx = np.sum(x, axis=1)[:, np.newaxis]
    return np.divide(x, idx, where=idx!=0)

def normalize(x):
    idx = np.sum(x, axis=1)[:, np.newaxis]
    return np.divide(x, idx, where=idx!=0, out=np.zeros_like(x))

print e[0]
a = normalize(e[1])
print e[1]
b = normalize(e[1])
print np.allclose(a,b)


print e[0]
a = normalize_badversion(e[1])
print e[1]
b = normalize_badversion(e[1])
print np.allclose(a,b)

Final note : I agree the current doc of numpy divide is not really clear on that matter. A recent fix was pushed in the numpy doc to clarify this, see https://github.com/numpy/numpy/commit/9a82c53c8a2b9bd4798e515544de4a701cbfba3f

Applying a function across a numpy array gives different answers in different runs

Question

1 answers

solution1
1 ACCPTED 2019-01-11 14:00:09

Applying a function across a numpy array gives different answers in different runs

Question

1 answers

solution1 1 ACCPTED 2019-01-11 14:00:09

solution1
1 ACCPTED 2019-01-11 14:00:09