为什么测试`NaN == NaN`不能从pandas dataFrame中删除？

Question

请解释一下如何在熊猫中对待NaN，因为以下逻辑似乎对我“破坏”，我尝试了各种方法（如下所示）来删除空值。

我使用read.csv从CSV文件加载的数据read.csv有一个列comments ，大多数时候都是空的。

marked_results.comments列看起来像这样; 列的所有其余部分都是NaN，因此pandas将空条目作为NaN加载，到目前为止一直很好：

0       VP
1       VP
2       VP
3     TEST
4      NaN
5      NaN
....

现在我尝试删除这些条目， 只有这样：

marked_results.comments.isnull()

所有这些都不起作用：

marked_results.comments.dropna()只提供相同的列，没有任何内容被删除，令人困惑。
marked_results.comments == NaN只给出一系列所有False 。 没有什么是NaNs ......令人困惑。
同样为marked_results.comments == nan

我也尝试过：

comments_values = marked_results.comments.unique()

array(['VP', 'TEST', nan], dtype=object)

# Ah, gotya! so now ive tried:
marked_results.comments == comments_values[2]
# but still all the results are Falses!!!

Answer 1

您应该使用isnull和notnull来测试NaN（使用pandas dtypes比使用numpy更强大），请参阅文档中的“缺少值” 。

在列上使用Series方法dropna不会影响原始数据帧，但可以执行您想要的操作：

In [11]: df
Out[11]:
  comments
0       VP
1       VP
2       VP
3     TEST
4      NaN
5      NaN

In [12]: df.comments.dropna()
Out[12]:
0      VP
1      VP
2      VP
3    TEST
Name: comments, dtype: object

dropna DataFrame方法有一个子集参数（用于删除在特定列中具有NaN的行）：

In [13]: df.dropna(subset=['comments'])
Out[13]:
  comments
0       VP
1       VP
2       VP
3     TEST

In [14]: df = df.dropna(subset=['comments'])

Answer 2

您需要使用math.isnan()函数（或numpy.isnan ）测试NaN 。 无法使用相等运算符检查NaN。

>>> a = float('NaN')
>>> a
nan
>>> a == 'NaN'
False
>>> isnan(a)
True
>>> a == float('NaN')
False

帮助功能 - >

isnan(...)
    isnan(x) -> bool

    Check if float x is not a number (NaN).

为什么测试`NaN == NaN`不能从pandas dataFrame中删除？

问题描述

2 个解决方案

解决方案1
15 已采纳 2013-07-31 12:18:21

解决方案2
7 2013-07-31 12:04:38

为什么测试`NaN == NaN`不能从pandas dataFrame中删除？

问题描述

2 个解决方案

解决方案1 15 已采纳 2013-07-31 12:18:21

解决方案2 7 2013-07-31 12:04:38

解决方案1
15 已采纳 2013-07-31 12:18:21

解决方案2
7 2013-07-31 12:04:38