Pandas布尔运算与一次比较与许多比较不一致

Question

I am trying to filter out some rows in my dataframe (with > 400000 rows) where values in one column have the None type. 我试图过滤掉我的数据框中的一些行（> 400000行），其中一列中的值具有None类型。 The goal is to leave my dataframe with only rows that have values that are float in the 'Column' column. 目标是让我的数据框只包含在“列”列中具有浮点值的行。 I plan on doing this by passing in an array of booleans, except that I can't construct my array of booleans properly (they all come back True). 我打算通过传递一系列布尔值来做到这一点，除了我不能正确地构造我的布尔数组（它们都返回True）。

When I run the following operation, given a value of i within the df range, the comparison works: 当我运行以下操作时，给定df范围内的i值，比较有效：

df.loc[i, 'Column'] != None

The rows that have a value of None in 'Column' give the results False. “Column”中值为None的行给出结果False。

But when I run this operation: 但是当我运行此操作时：

df.loc[0:len(df), 'Column'] != None

The boolean array comes back as all True. 布尔数组返回全部为True。

Why is this? 为什么是这样？ Is this a pandas bug? 这是一只熊猫虫吗？ An edge case? 边缘案例？ Intended behaviour for reasons I don't understand? 因我不理解的原因而出于预期的行为？

I can think of other ways to construct my boolean array, though this seems the most efficient. 我可以想到构建我的布尔数组的其他方法，虽然这似乎是最有效的。 But it bothers me that this is the result I am getting. 但令我困扰的是，这是我得到的结果。

Answer 1

Here's a reproducible example of what you're seeing: 以下是您所看到的可重现的示例：

x = pd.Series([1, None, 3, None, None])

print(x != None)

0    True
1    True
2    True
3    True
4    True
dtype: bool

What's not obvious is behind the scenes Pandas converts your series to numeric and converts those None values to np.nan : 幕后不太明显Pandas将您的系列转换为数字并将这些None值转换为np.nan ：

print(x)

0    1.0
1    NaN
2    3.0
3    NaN
4    NaN
dtype: float64

The NumPy array underlying the series can then be held in a contiguous memory block and support vectorised operations. 然后，系列底层的NumPy数组可以保存在连续的内存块中，并支持向量化操作。 Since np.nan != np.nan by design , your Boolean series will contain only True values, even if you were to test against np.nan instead of None . 由于np.nan != np.nan的设计，你的布尔系列将只包含True值，即使你要测试np.nan而不是None 。

For efficiency and correctness, you should use pd.to_numeric with isnull / notnull for checking null values: 为了提高效率和正确性，你应该使用pd.to_numeric与isnull / notnull检查空值：

print(pd.to_numeric(x, errors='coerce').notnull())

0     True
1    False
2     True
3    False
4    False
dtype: bool

Pandas布尔运算与一次比较与许多比较不一致

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-11-12 17:26:54

Pandas布尔运算与一次比较与许多比较不一致

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-11-12 17:26:54

解决方案1
3 已采纳 2018-11-12 17:26:54