Filter non-NaN values by column in Pandas

Question

I would like to filter out NaN values and keep remaining rows in Label column.

df :

        Timestamp               Label
157505  2010-09-21 23:13:21.090 1
321498  2010-09-22 00:44:14.890 1
332687  2010-09-22 00:44:15.890 1
330028  2010-09-22 00:44:17.890 NaN
293410  2010-09-22 00:44:18.440 2
23093   2010-09-22 00:44:19.890 2
282054  2010-09-22 00:44:23.440 2
158381  2010-09-22 01:04:33.440 NaN
317397  2010-09-22 01:27:01.790 NaN
170770  2010-09-22 02:18:52.850 NaN

reproducible example:

from pandas import *
import numpy as np 
import pandas as pd 

df = pd.DataFrame({'Timestamp': {157505: Timestamp('2010-09-21 23:13:21.090000'),
  321498: Timestamp('2010-09-22 00:44:14.890000'),
  332687: Timestamp('2010-09-22 00:44:15.890000'),
  330028: Timestamp('2010-09-22 00:44:17.890000'),
  293410: Timestamp('2010-09-22 00:44:18.440000'),
  23093: Timestamp('2010-09-22 00:44:19.890000'),
  282054: Timestamp('2010-09-22 00:44:23.440000'),
  158381: Timestamp('2010-09-22 01:04:33.440000'),
  317397: Timestamp('2010-09-22 01:27:01.790000'),
  170770: Timestamp('2010-09-22 02:18:52.850000')},
 'Label': {157505: 1,
  321498: 1,
  332687: 1,
  330028: 'NaN',
  293410: 2,
  23093: 2,
  282054: 2,
  158381: 'NaN',
  317397: 'NaN',
  170770: 'NaN'}})
df

I tried:

df[df.Label.notnull()]

and got exactly the same table:


        Timestamp               Label
157505  2010-09-21 23:13:21.090 1
321498  2010-09-22 00:44:14.890 1
332687  2010-09-22 00:44:15.890 1
330028  2010-09-22 00:44:17.890 NaN
293410  2010-09-22 00:44:18.440 2
23093   2010-09-22 00:44:19.890 2
282054  2010-09-22 00:44:23.440 2
158381  2010-09-22 01:04:33.440 NaN
317397  2010-09-22 01:27:01.790 NaN
170770  2010-09-22 02:18:52.850 NaN

What's wrong and what's the best way to do it?

Answer 1

Please convert Label to float from dtype object and use notna() or isna()

df=df[df.Label.astype(float).notna()]
print(df)




                   Timestamp  Label
157505 2010-09-21 23:13:21.090    1.0
321498 2010-09-22 00:44:14.890    1.0
332687 2010-09-22 00:44:15.890    1.0
293410 2010-09-22 00:44:18.440    2.0
23093  2010-09-22 00:44:19.890    2.0
282054 2010-09-22 00:44:23.440    2.0

Answer 2

You can do this:

df['Label'] = df['Label'].replace('NaN', np.nan)
df.dropna(inplace=True)
print(df)

or

df = df[df['Label'].notna()]
print(df)

                     Timestamp  Label
157505 2010-09-21 23:13:21.090    1.0
321498 2010-09-22 00:44:14.890    1.0
332687 2010-09-22 00:44:15.890    1.0
293410 2010-09-22 00:44:18.440    2.0
23093  2010-09-22 00:44:19.890    2.0
282054 2010-09-22 00:44:23.440    2.0

Answer 3

I understand you're trying to filter the Nan values. However notnull() filters doesn't filter string 'NaN'. Replacing it with np.nan will give the results you're expecting. Additionally you may choose to drop it.

from pandas import *
import numpy as np 
import pandas as pd 

df = pd.DataFrame({'Timestamp': {157505: Timestamp('2010-09-21 23:13:21.090000'),
  321498: Timestamp('2010-09-22 00:44:14.890000'),
  332687: Timestamp('2010-09-22 00:44:15.890000'),
  330028: Timestamp('2010-09-22 00:44:17.890000'),
  293410: Timestamp('2010-09-22 00:44:18.440000'),
  23093: Timestamp('2010-09-22 00:44:19.890000'),
  282054: Timestamp('2010-09-22 00:44:23.440000'),
  158381: Timestamp('2010-09-22 01:04:33.440000'),
  317397: Timestamp('2010-09-22 01:27:01.790000'),
  170770: Timestamp('2010-09-22 02:18:52.850000')},
 'Label': {157505: 1,
  321498: 1,
  332687: 1,
  330028: np.nan,
  293410: 2,
  23093: 2,
  282054: 2,
  158381: np.nan,
  317397: np.nan,
  170770: np.nan}})

df[df.Label.notnull()]

will get :


Timestamp   Label
157505  2010-09-21 23:13:21.090 1.0
321498  2010-09-22 00:44:14.890 1.0
332687  2010-09-22 00:44:15.890 1.0
293410  2010-09-22 00:44:18.440 2.0
23093   2010-09-22 00:44:19.890 2.0
282054  2010-09-22 00:44:23.440 2.0

Or

df.dropna()

It will give same results:

    Timestamp   Label
157505  2010-09-21 23:13:21.090 1.0
321498  2010-09-22 00:44:14.890 1.0
332687  2010-09-22 00:44:15.890 1.0
293410  2010-09-22 00:44:18.440 2.0
23093   2010-09-22 00:44:19.890 2.0
282054  2010-09-22 00:44:23.440 2.0

Filter non-NaN values by column in Pandas

Question

3 answers

solution1
1 2020-11-02 22:06:28

solution2
1 ACCPTED 2020-11-02 22:08:13

solution3
1 2020-11-02 22:16:54

Filter non-NaN values by column in Pandas

Question

3 answers

solution1 1 2020-11-02 22:06:28

solution2 1 ACCPTED 2020-11-02 22:08:13

solution3 1 2020-11-02 22:16:54

solution1
1 2020-11-02 22:06:28

solution2
1 ACCPTED 2020-11-02 22:08:13

solution3
1 2020-11-02 22:16:54