[英]How to select all non-NaN columns and non-NaN last column using pandas?
[英]Filter non-NaN values by column in Pandas
我NaN
濾掉NaN
值並將剩余的行保留在Label
列中。
df
:
Timestamp Label
157505 2010-09-21 23:13:21.090 1
321498 2010-09-22 00:44:14.890 1
332687 2010-09-22 00:44:15.890 1
330028 2010-09-22 00:44:17.890 NaN
293410 2010-09-22 00:44:18.440 2
23093 2010-09-22 00:44:19.890 2
282054 2010-09-22 00:44:23.440 2
158381 2010-09-22 01:04:33.440 NaN
317397 2010-09-22 01:27:01.790 NaN
170770 2010-09-22 02:18:52.850 NaN
可重現的例子:
from pandas import *
import numpy as np
import pandas as pd
df = pd.DataFrame({'Timestamp': {157505: Timestamp('2010-09-21 23:13:21.090000'),
321498: Timestamp('2010-09-22 00:44:14.890000'),
332687: Timestamp('2010-09-22 00:44:15.890000'),
330028: Timestamp('2010-09-22 00:44:17.890000'),
293410: Timestamp('2010-09-22 00:44:18.440000'),
23093: Timestamp('2010-09-22 00:44:19.890000'),
282054: Timestamp('2010-09-22 00:44:23.440000'),
158381: Timestamp('2010-09-22 01:04:33.440000'),
317397: Timestamp('2010-09-22 01:27:01.790000'),
170770: Timestamp('2010-09-22 02:18:52.850000')},
'Label': {157505: 1,
321498: 1,
332687: 1,
330028: 'NaN',
293410: 2,
23093: 2,
282054: 2,
158381: 'NaN',
317397: 'NaN',
170770: 'NaN'}})
df
我試過:
df[df.Label.notnull()]
並得到完全相同的表:
Timestamp Label
157505 2010-09-21 23:13:21.090 1
321498 2010-09-22 00:44:14.890 1
332687 2010-09-22 00:44:15.890 1
330028 2010-09-22 00:44:17.890 NaN
293410 2010-09-22 00:44:18.440 2
23093 2010-09-22 00:44:19.890 2
282054 2010-09-22 00:44:23.440 2
158381 2010-09-22 01:04:33.440 NaN
317397 2010-09-22 01:27:01.790 NaN
170770 2010-09-22 02:18:52.850 NaN
出了什么問題,最好的方法是什么?
請將 Label 從notna()
object
轉換為float
並使用notna()
或notna()
isna()
df=df[df.Label.astype(float).notna()]
print(df)
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0
你可以這樣做:
df['Label'] = df['Label'].replace('NaN', np.nan)
df.dropna(inplace=True)
print(df)
或者
df = df[df['Label'].notna()]
print(df)
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0
我知道您正在嘗試過濾 Nan 值。 但是 notnull() 過濾器不會過濾字符串“NaN”。 用 np.nan 替換它會得到你期望的結果。 此外,您可以選擇放棄它。
from pandas import *
import numpy as np
import pandas as pd
df = pd.DataFrame({'Timestamp': {157505: Timestamp('2010-09-21 23:13:21.090000'),
321498: Timestamp('2010-09-22 00:44:14.890000'),
332687: Timestamp('2010-09-22 00:44:15.890000'),
330028: Timestamp('2010-09-22 00:44:17.890000'),
293410: Timestamp('2010-09-22 00:44:18.440000'),
23093: Timestamp('2010-09-22 00:44:19.890000'),
282054: Timestamp('2010-09-22 00:44:23.440000'),
158381: Timestamp('2010-09-22 01:04:33.440000'),
317397: Timestamp('2010-09-22 01:27:01.790000'),
170770: Timestamp('2010-09-22 02:18:52.850000')},
'Label': {157505: 1,
321498: 1,
332687: 1,
330028: np.nan,
293410: 2,
23093: 2,
282054: 2,
158381: np.nan,
317397: np.nan,
170770: np.nan}})
df[df.Label.notnull()]
將得到:
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0
或者
df.dropna()
它會給出相同的結果:
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.