[英]How to select all non-NaN columns and non-NaN last column using pandas?
[英]Filter non-NaN values by column in Pandas
我NaN
滤掉NaN
值并将剩余的行保留在Label
列中。
df
:
Timestamp Label
157505 2010-09-21 23:13:21.090 1
321498 2010-09-22 00:44:14.890 1
332687 2010-09-22 00:44:15.890 1
330028 2010-09-22 00:44:17.890 NaN
293410 2010-09-22 00:44:18.440 2
23093 2010-09-22 00:44:19.890 2
282054 2010-09-22 00:44:23.440 2
158381 2010-09-22 01:04:33.440 NaN
317397 2010-09-22 01:27:01.790 NaN
170770 2010-09-22 02:18:52.850 NaN
可重现的例子:
from pandas import *
import numpy as np
import pandas as pd
df = pd.DataFrame({'Timestamp': {157505: Timestamp('2010-09-21 23:13:21.090000'),
321498: Timestamp('2010-09-22 00:44:14.890000'),
332687: Timestamp('2010-09-22 00:44:15.890000'),
330028: Timestamp('2010-09-22 00:44:17.890000'),
293410: Timestamp('2010-09-22 00:44:18.440000'),
23093: Timestamp('2010-09-22 00:44:19.890000'),
282054: Timestamp('2010-09-22 00:44:23.440000'),
158381: Timestamp('2010-09-22 01:04:33.440000'),
317397: Timestamp('2010-09-22 01:27:01.790000'),
170770: Timestamp('2010-09-22 02:18:52.850000')},
'Label': {157505: 1,
321498: 1,
332687: 1,
330028: 'NaN',
293410: 2,
23093: 2,
282054: 2,
158381: 'NaN',
317397: 'NaN',
170770: 'NaN'}})
df
我试过:
df[df.Label.notnull()]
并得到完全相同的表:
Timestamp Label
157505 2010-09-21 23:13:21.090 1
321498 2010-09-22 00:44:14.890 1
332687 2010-09-22 00:44:15.890 1
330028 2010-09-22 00:44:17.890 NaN
293410 2010-09-22 00:44:18.440 2
23093 2010-09-22 00:44:19.890 2
282054 2010-09-22 00:44:23.440 2
158381 2010-09-22 01:04:33.440 NaN
317397 2010-09-22 01:27:01.790 NaN
170770 2010-09-22 02:18:52.850 NaN
出了什么问题,最好的方法是什么?
请将 Label 从notna()
object
转换为float
并使用notna()
或notna()
isna()
df=df[df.Label.astype(float).notna()]
print(df)
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0
你可以这样做:
df['Label'] = df['Label'].replace('NaN', np.nan)
df.dropna(inplace=True)
print(df)
或者
df = df[df['Label'].notna()]
print(df)
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0
我知道您正在尝试过滤 Nan 值。 但是 notnull() 过滤器不会过滤字符串“NaN”。 用 np.nan 替换它会得到你期望的结果。 此外,您可以选择放弃它。
from pandas import *
import numpy as np
import pandas as pd
df = pd.DataFrame({'Timestamp': {157505: Timestamp('2010-09-21 23:13:21.090000'),
321498: Timestamp('2010-09-22 00:44:14.890000'),
332687: Timestamp('2010-09-22 00:44:15.890000'),
330028: Timestamp('2010-09-22 00:44:17.890000'),
293410: Timestamp('2010-09-22 00:44:18.440000'),
23093: Timestamp('2010-09-22 00:44:19.890000'),
282054: Timestamp('2010-09-22 00:44:23.440000'),
158381: Timestamp('2010-09-22 01:04:33.440000'),
317397: Timestamp('2010-09-22 01:27:01.790000'),
170770: Timestamp('2010-09-22 02:18:52.850000')},
'Label': {157505: 1,
321498: 1,
332687: 1,
330028: np.nan,
293410: 2,
23093: 2,
282054: 2,
158381: np.nan,
317397: np.nan,
170770: np.nan}})
df[df.Label.notnull()]
将得到:
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0
或者
df.dropna()
它会给出相同的结果:
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.