简体   繁体   English

为什么测试`NaN == NaN`不能从pandas dataFrame中删除?

[英]Why does testing `NaN == NaN` not work for dropping from a pandas dataFrame?

Please explain how NaN's are treated in pandas because the following logic seems "broken" to me, I tried various ways (shown below) to drop the empty values. 请解释一下如何在熊猫中对待NaN,因为以下逻辑似乎对我“破坏”,我尝试了各种方法(如下所示)来删除空值。

My dataframe, which I load from a CSV file using read.csv , has a column comments , which is empty most of the time. 我使用read.csv从CSV文件加载的数据read.csv有一个列comments ,大多数时候都是空的。

The column marked_results.comments looks like this; marked_results.comments列看起来像这样; all the rest of the column is NaN, so pandas loads empty entries as NaNs, so far so good: 列的所有其余部分都是NaN,因此pandas将空条目作为NaN加载,到目前为止一直很好:

0       VP
1       VP
2       VP
3     TEST
4      NaN
5      NaN
....

Now I try to drop those entries, only this works: 现在我尝试删除这些条目, 只有这样:

  • marked_results.comments.isnull()

All these don't work: 所有这些都不起作用:

  • marked_results.comments.dropna() only gives the same column, nothing gets dropped, confusing. marked_results.comments.dropna()只提供相同的列,没有任何内容被删除,令人困惑。
  • marked_results.comments == NaN only gives a series of all False s. marked_results.comments == NaN只给出一系列所有False Nothing was NaNs... confusing. 没有什么是NaNs ......令人困惑。
  • likewise marked_results.comments == nan 同样为marked_results.comments == nan

I also tried: 我也尝试过:

comments_values = marked_results.comments.unique()

array(['VP', 'TEST', nan], dtype=object)

# Ah, gotya! so now ive tried:
marked_results.comments == comments_values[2]
# but still all the results are Falses!!!

You should use isnull and notnull to test for NaN (these are more robust using pandas dtypes than numpy), see "values considered missing" in the docs . 您应该使用isnullnotnull来测试NaN(使用pandas dtypes比使用numpy更强大),请参阅文档中的“缺少值”

Using the Series method dropna on a column won't affect the original dataframe, but do what you want: 在列上使用Series方法dropna不会影响原始数据帧,但可以执行您想要的操作:

In [11]: df
Out[11]:
  comments
0       VP
1       VP
2       VP
3     TEST
4      NaN
5      NaN

In [12]: df.comments.dropna()
Out[12]:
0      VP
1      VP
2      VP
3    TEST
Name: comments, dtype: object

The dropna DataFrame method has a subset argument (to drop rows which have NaNs in specific columns): dropna DataFrame方法有一个子集参数(用于删除在特定列中具有NaN的行):

In [13]: df.dropna(subset=['comments'])
Out[13]:
  comments
0       VP
1       VP
2       VP
3     TEST

In [14]: df = df.dropna(subset=['comments'])

You need to test NaN with math.isnan() function (Or numpy.isnan ). 您需要使用math.isnan()函数(或numpy.isnan )测试NaN NaNs cannot be checked with the equality operator. 无法使用相等运算符检查NaN。

>>> a = float('NaN')
>>> a
nan
>>> a == 'NaN'
False
>>> isnan(a)
True
>>> a == float('NaN')
False

Help Function -> 帮助功能 - >

isnan(...)
    isnan(x) -> bool

    Check if float x is not a number (NaN).

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM