简体   繁体   中英

why dropna() is not working as I expect it to?

I have asked this question already once but deleted it because it did not address the issue in the correct way.

I want to drop all rows that contain NaN . I am quite sure I would need to apply

df.dropna(how='all', inplace=True)

to achieve what I need. But for some unknown reason it simply does not work. I even have the suspicion that it's a software/version related issue. I am working with anaconda and pandas 0.18.0 and conda version conda version : 4.1.2 conda-build version : 1.19.0 python version : 3.5.1.final.0 requests version : 2.9.1

I create a data frame from csv with following:

df1 = pd.read_csv('Vols.csv', sep=',', parse_dates=True, 
index_col="Date",usecols=['Date','60DAY_IMPVOL'])
df2 = pd.read_csv('DAX02072016.csv', sep=',', index_col= "Date", parse_dates=True,
usecols=['Date','Close'])
df = pd.concat([df1, df2], axis=1)

What I get is a data frame:

         60DAY_IMPVOL        Close
Date
2004-02-03     NaN            4057.510010
2004-02-04     NaN            4028.370117
2004-02-05     NaN            4014.790039
2004-02-06     18.54          4044.989990
2004-02-09     17.76          4098.970215
2004-02-10     NaN            4077.635363

and applying dropna() does not make anything, also when I use axis=1 or axis=0. So anyone any suggestion what could be the reason why it's not working?

how='all' doesn't mean "drop all rows which contain a NaN", it means "drop rows which are all NaN". You want how='any' , which means "drop rows which contain any NaN".

>>> df.dropna(how='all')
            60DAY_IMPVOL        Close
Date                                 
2004-02-03           NaN  4057.510010
2004-02-04           NaN  4028.370117
2004-02-05           NaN  4014.790039
2004-02-06         18.54  4044.989990
2004-02-09         17.76  4098.970215
2004-02-10           NaN  4077.635363
>>> df.dropna(how='any')
            60DAY_IMPVOL        Close
Date                                 
2004-02-06         18.54  4044.989990
2004-02-09         17.76  4098.970215

how='any' is actually the default, so to be honest, df.dropna() would have worked too.

(Note that inplace=True is a little out of favour, and usually we'd just write df = df.dropna(how='any') these days.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM