简体   繁体   中英

Why does testing `NaN == NaN` not work for dropping from a pandas dataFrame?

Please explain how NaN's are treated in pandas because the following logic seems "broken" to me, I tried various ways (shown below) to drop the empty values.

My dataframe, which I load from a CSV file using read.csv , has a column comments , which is empty most of the time.

The column marked_results.comments looks like this; all the rest of the column is NaN, so pandas loads empty entries as NaNs, so far so good:

0       VP
1       VP
2       VP
3     TEST
4      NaN
5      NaN
....

Now I try to drop those entries, only this works:

  • marked_results.comments.isnull()

All these don't work:

  • marked_results.comments.dropna() only gives the same column, nothing gets dropped, confusing.
  • marked_results.comments == NaN only gives a series of all False s. Nothing was NaNs... confusing.
  • likewise marked_results.comments == nan

I also tried:

comments_values = marked_results.comments.unique()

array(['VP', 'TEST', nan], dtype=object)

# Ah, gotya! so now ive tried:
marked_results.comments == comments_values[2]
# but still all the results are Falses!!!

You should use isnull and notnull to test for NaN (these are more robust using pandas dtypes than numpy), see "values considered missing" in the docs .

Using the Series method dropna on a column won't affect the original dataframe, but do what you want:

In [11]: df
Out[11]:
  comments
0       VP
1       VP
2       VP
3     TEST
4      NaN
5      NaN

In [12]: df.comments.dropna()
Out[12]:
0      VP
1      VP
2      VP
3    TEST
Name: comments, dtype: object

The dropna DataFrame method has a subset argument (to drop rows which have NaNs in specific columns):

In [13]: df.dropna(subset=['comments'])
Out[13]:
  comments
0       VP
1       VP
2       VP
3     TEST

In [14]: df = df.dropna(subset=['comments'])

You need to test NaN with math.isnan() function (Or numpy.isnan ). NaNs cannot be checked with the equality operator.

>>> a = float('NaN')
>>> a
nan
>>> a == 'NaN'
False
>>> isnan(a)
True
>>> a == float('NaN')
False

Help Function ->

isnan(...)
    isnan(x) -> bool

    Check if float x is not a number (NaN).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM