简体   繁体   中英

Why does numpy.ndarray allow for a “None” array?

I was wondering what is the rationale for the following functionality of numpy.ndarray :

>>> a = None
>>> a = np.asarray(a)
array(None, dtype=object)

>>> type(a)
<class 'numpy.ndarray'>

>>> a == None
True

>>> a is None
False

So in this case Python seems to actually create a None array (not array of Nones), which seems to enforce a type over variable a . But the documentation states that the positional argument needs to be "array_like":

a : array_like

Input data, in any form that can be converted to an array. This includes lists, lists of tuples, tuples, tuples of tuples, tuples of lists and ndarrays.

So why is None accepted as "array-like" since it is not any of the listed above?

By analogy, list(None) will return error because None is not "iterable" as per documentation.

Furthermore, some functions seem to actually return seemingly incorrect values. For example np.ndarray.argmax() or np.ndarray.argmin() actually return 0 for a "None array", but result in an error for an empty array which intuitively seems like the expected behaviour.

>>> a
array(None, dtype=object)
>>> b
array([], dtype=object)
>>> a.argmax()
0
>>> b.argmax()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: attempt to get argmax of an empty sequence

Is there actually any advantage to having a "None array" ( array(None, dtype=object) ) as opposed to an empty array ( array([], dtype=object) )?

Is this an intended functionality, or accidental consequence of Nones being actual objects? Could someone explain what's going on under the bonnet here and why?

Thanks a lot!

What you are getting with np.asarray(None) is an array with shape () , which is a scalar , with dtype object . You get something similar if you do np.asarray(2) or np.asarray('abc') . Scalars cannot be iterated but can be compared to non-NumPy values. At the same time, you get NumPy operations with them, so you can do:

list(np.asarray(None).reshape((1,)))

And it works.

About functions like argmin or argmax . Note that a scalar is not empty. An array with shape () has one element, yet zero dimensions, while an array with shape (0,) has no elements but one dimension. This may be counterintuitive but it makes sense and makes things work too. As documented, argmin and argmax , when no axis value is given, work on the flattened array. The flattened array for a scalar (eg np.asarray(None).ravel() ) is an array with shape (1,) , and, since you are asking for the index of the smallest or greatest value and it only has one value, the answer is 0 in both cases. Interestingly, if you try np.argmin(np.asarray([None, None])) it fails, because now you have two elements and you need to compare them to know which one is the smallest, but you cannot compare None values.

I was wondering what is the rationale for the following functionality of numpy.ndarray:

NumPy allows 0-dimensional arrays, and it allows arrays of object dtype. Together, these facts mean that any object can be interpreted as a 0-dimensional array-like of object dtype, and that's how numpy.array will interpret any argument it can't find some other way to interpret. That's what's happening here.

What you have is a 0-dimensional, 1-element array whose 1 element is None.

In [12]: x = numpy.array(None)
In [13]: x.shape
Out[13]: ()
In [14]: x.size
Out[14]: 1
In [15]: print(x.item())
None

So in this case Python seems to actually create a None array (not array of Nones)

No, it is an array of Nones. It is an array of exactly one None. You can access the None by providing a tuple of no indices, or by calling the item() method, or in a bunch of other ways.

In [15]: print(x.item())
None
In [16]: print(x[()])
None

So why is None accepted as "array-like" since it is not any of the listed above?

The list is not intended to be exhaustive.

Furthermore, some functions seem to actually return seemingly incorrect values. For example np.ndarray.argmax() or np.ndarray.argmin() actually return 0 for a "None array", but result in an error for an empty array which intuitively seems like the expected behaviour.

If you don't provide an axis argument, argmax and argmin default to working over a flattened form of the input. 0 is the index of the only element of the flattened form of your 0-dimensional array.

In [23]: y = x.ravel()
In [24]: y
Out[24]: array([None], dtype=object)
In [25]: y.argmin()
Out[25]: 0
In [26]: y.argmax()
Out[26]: 0
In [27]: print(y[0])
None

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM