繁体   English   中英

在 Pandas 中按列过滤非 NaN 值

[英]Filter non-NaN values by column in Pandas

NaN滤掉NaN值并将剩余的行保留在Label列中。

df

        Timestamp               Label
157505  2010-09-21 23:13:21.090 1
321498  2010-09-22 00:44:14.890 1
332687  2010-09-22 00:44:15.890 1
330028  2010-09-22 00:44:17.890 NaN
293410  2010-09-22 00:44:18.440 2
23093   2010-09-22 00:44:19.890 2
282054  2010-09-22 00:44:23.440 2
158381  2010-09-22 01:04:33.440 NaN
317397  2010-09-22 01:27:01.790 NaN
170770  2010-09-22 02:18:52.850 NaN

可重现的例子:

from pandas import *
import numpy as np 
import pandas as pd 

df = pd.DataFrame({'Timestamp': {157505: Timestamp('2010-09-21 23:13:21.090000'),
  321498: Timestamp('2010-09-22 00:44:14.890000'),
  332687: Timestamp('2010-09-22 00:44:15.890000'),
  330028: Timestamp('2010-09-22 00:44:17.890000'),
  293410: Timestamp('2010-09-22 00:44:18.440000'),
  23093: Timestamp('2010-09-22 00:44:19.890000'),
  282054: Timestamp('2010-09-22 00:44:23.440000'),
  158381: Timestamp('2010-09-22 01:04:33.440000'),
  317397: Timestamp('2010-09-22 01:27:01.790000'),
  170770: Timestamp('2010-09-22 02:18:52.850000')},
 'Label': {157505: 1,
  321498: 1,
  332687: 1,
  330028: 'NaN',
  293410: 2,
  23093: 2,
  282054: 2,
  158381: 'NaN',
  317397: 'NaN',
  170770: 'NaN'}})
df

我试过:

df[df.Label.notnull()]

并得到完全相同的表:


        Timestamp               Label
157505  2010-09-21 23:13:21.090 1
321498  2010-09-22 00:44:14.890 1
332687  2010-09-22 00:44:15.890 1
330028  2010-09-22 00:44:17.890 NaN
293410  2010-09-22 00:44:18.440 2
23093   2010-09-22 00:44:19.890 2
282054  2010-09-22 00:44:23.440 2
158381  2010-09-22 01:04:33.440 NaN
317397  2010-09-22 01:27:01.790 NaN
170770  2010-09-22 02:18:52.850 NaN

出了什么问题,最好的方法是什么?

请将 Label 从notna() object转换为float并使用notna()notna() isna()

df=df[df.Label.astype(float).notna()]
print(df)




                   Timestamp  Label
157505 2010-09-21 23:13:21.090    1.0
321498 2010-09-22 00:44:14.890    1.0
332687 2010-09-22 00:44:15.890    1.0
293410 2010-09-22 00:44:18.440    2.0
23093  2010-09-22 00:44:19.890    2.0
282054 2010-09-22 00:44:23.440    2.0

你可以这样做:

df['Label'] = df['Label'].replace('NaN', np.nan)
df.dropna(inplace=True)
print(df)

或者

df = df[df['Label'].notna()]
print(df)

                     Timestamp  Label
157505 2010-09-21 23:13:21.090    1.0
321498 2010-09-22 00:44:14.890    1.0
332687 2010-09-22 00:44:15.890    1.0
293410 2010-09-22 00:44:18.440    2.0
23093  2010-09-22 00:44:19.890    2.0
282054 2010-09-22 00:44:23.440    2.0

我知道您正在尝试过滤 Nan 值。 但是 notnull() 过滤器不会过滤字符串“NaN”。 用 np.nan 替换它会得到你期望的结果。 此外,您可以选择放弃它。

from pandas import *
import numpy as np 
import pandas as pd 

df = pd.DataFrame({'Timestamp': {157505: Timestamp('2010-09-21 23:13:21.090000'),
  321498: Timestamp('2010-09-22 00:44:14.890000'),
  332687: Timestamp('2010-09-22 00:44:15.890000'),
  330028: Timestamp('2010-09-22 00:44:17.890000'),
  293410: Timestamp('2010-09-22 00:44:18.440000'),
  23093: Timestamp('2010-09-22 00:44:19.890000'),
  282054: Timestamp('2010-09-22 00:44:23.440000'),
  158381: Timestamp('2010-09-22 01:04:33.440000'),
  317397: Timestamp('2010-09-22 01:27:01.790000'),
  170770: Timestamp('2010-09-22 02:18:52.850000')},
 'Label': {157505: 1,
  321498: 1,
  332687: 1,
  330028: np.nan,
  293410: 2,
  23093: 2,
  282054: 2,
  158381: np.nan,
  317397: np.nan,
  170770: np.nan}})

df[df.Label.notnull()]

将得到:


Timestamp   Label
157505  2010-09-21 23:13:21.090 1.0
321498  2010-09-22 00:44:14.890 1.0
332687  2010-09-22 00:44:15.890 1.0
293410  2010-09-22 00:44:18.440 2.0
23093   2010-09-22 00:44:19.890 2.0
282054  2010-09-22 00:44:23.440 2.0

或者

df.dropna()

它会给出相同的结果:

    Timestamp   Label
157505  2010-09-21 23:13:21.090 1.0
321498  2010-09-22 00:44:14.890 1.0
332687  2010-09-22 00:44:15.890 1.0
293410  2010-09-22 00:44:18.440 2.0
23093   2010-09-22 00:44:19.890 2.0
282054  2010-09-22 00:44:23.440 2.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM