In pure Python, None or True
returns True
.
However with pandas when I'm doing a |
between two Series containing None values, results are not as I expected:
>>> df.to_dict()
{'buybox': {0: None}, 'buybox_y': {0: True}}
>>> df
buybox buybox_y
0 None True
>>> df['buybox'] = (df['buybox'] | df['buybox_y'])
>>> df
buybox buybox_y
0 False True
Expected result:
>>> df
buybox buybox_y
0 True True
I get the result I want by applying the OR operation twice, but I don't get why I should do this.
I'm not looking for a workaround (I have it by applying df['buybox'] = (df['buybox'] | df['buybox_y'])
twice in a row) but an explanation, thus the 'why' in the title.
Pandas |
operator does not rely on Python or expression
, and behaves differently.
If both operands are boolean, the result is mathematically defined, and the same for Python and Pandas.
But in your case series "buybox" is of type object
, and "buybox_y" is bool
. In this case Pandas |
operator is not commutative :
bitwise or
is attempted
None | True
None | True
is invalid operation, resulting in None
Thus,
>>> df['buybox'] | df['buybox_y']
0 False
>>> df['buybox_y'] | df['buybox']
0 True
For predictable results, you can clean up data, and cast to boolean type with Pandas astype
before attempting boolean operations.
For Boolean objects (ie Py_True and Py_False), the code will enter the fast processing branch; for other objects, PyObject_IsTrue() will be used to calculate a value of type int.
During the calculation process, the PyObject_IsTrue() function will obtain the values of nb_bool, mp_length, and sq_length in turn, which should correspond to the return values of the two magic methods bool () and len ().
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.