I would like to filter out NaN
values and keep remaining rows in Label
column.
df
:
Timestamp Label
157505 2010-09-21 23:13:21.090 1
321498 2010-09-22 00:44:14.890 1
332687 2010-09-22 00:44:15.890 1
330028 2010-09-22 00:44:17.890 NaN
293410 2010-09-22 00:44:18.440 2
23093 2010-09-22 00:44:19.890 2
282054 2010-09-22 00:44:23.440 2
158381 2010-09-22 01:04:33.440 NaN
317397 2010-09-22 01:27:01.790 NaN
170770 2010-09-22 02:18:52.850 NaN
reproducible example:
from pandas import *
import numpy as np
import pandas as pd
df = pd.DataFrame({'Timestamp': {157505: Timestamp('2010-09-21 23:13:21.090000'),
321498: Timestamp('2010-09-22 00:44:14.890000'),
332687: Timestamp('2010-09-22 00:44:15.890000'),
330028: Timestamp('2010-09-22 00:44:17.890000'),
293410: Timestamp('2010-09-22 00:44:18.440000'),
23093: Timestamp('2010-09-22 00:44:19.890000'),
282054: Timestamp('2010-09-22 00:44:23.440000'),
158381: Timestamp('2010-09-22 01:04:33.440000'),
317397: Timestamp('2010-09-22 01:27:01.790000'),
170770: Timestamp('2010-09-22 02:18:52.850000')},
'Label': {157505: 1,
321498: 1,
332687: 1,
330028: 'NaN',
293410: 2,
23093: 2,
282054: 2,
158381: 'NaN',
317397: 'NaN',
170770: 'NaN'}})
df
I tried:
df[df.Label.notnull()]
and got exactly the same table:
Timestamp Label
157505 2010-09-21 23:13:21.090 1
321498 2010-09-22 00:44:14.890 1
332687 2010-09-22 00:44:15.890 1
330028 2010-09-22 00:44:17.890 NaN
293410 2010-09-22 00:44:18.440 2
23093 2010-09-22 00:44:19.890 2
282054 2010-09-22 00:44:23.440 2
158381 2010-09-22 01:04:33.440 NaN
317397 2010-09-22 01:27:01.790 NaN
170770 2010-09-22 02:18:52.850 NaN
What's wrong and what's the best way to do it?
Please convert Label to float
from dtype object
and use notna()
or isna()
df=df[df.Label.astype(float).notna()]
print(df)
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0
You can do this:
df['Label'] = df['Label'].replace('NaN', np.nan)
df.dropna(inplace=True)
print(df)
or
df = df[df['Label'].notna()]
print(df)
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0
I understand you're trying to filter the Nan values. However notnull() filters doesn't filter string 'NaN'. Replacing it with np.nan will give the results you're expecting. Additionally you may choose to drop it.
from pandas import *
import numpy as np
import pandas as pd
df = pd.DataFrame({'Timestamp': {157505: Timestamp('2010-09-21 23:13:21.090000'),
321498: Timestamp('2010-09-22 00:44:14.890000'),
332687: Timestamp('2010-09-22 00:44:15.890000'),
330028: Timestamp('2010-09-22 00:44:17.890000'),
293410: Timestamp('2010-09-22 00:44:18.440000'),
23093: Timestamp('2010-09-22 00:44:19.890000'),
282054: Timestamp('2010-09-22 00:44:23.440000'),
158381: Timestamp('2010-09-22 01:04:33.440000'),
317397: Timestamp('2010-09-22 01:27:01.790000'),
170770: Timestamp('2010-09-22 02:18:52.850000')},
'Label': {157505: 1,
321498: 1,
332687: 1,
330028: np.nan,
293410: 2,
23093: 2,
282054: 2,
158381: np.nan,
317397: np.nan,
170770: np.nan}})
df[df.Label.notnull()]
will get :
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0
Or
df.dropna()
It will give same results:
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.