简体   繁体   English

根据条件在熊猫数据框中删除一行

[英]Deleting a row in pandas dataframe based on condition

Scenario: I have a dataframe with some nan scattered around. 场景:我有一个数据框,周围散布着一些nan。 It has multiple columns, the ones of interest are "bid" and "ask" 它有多个列,感兴趣的是“出价”和“询问”

What I want to do: I want to remove all rows where the bid column value is nan AND the ask column value is nan. 我想做什么:我想删除所有出价列值为nan而要价列为nan的行。

Question: What is the best way to do it? 问题:最佳方法是什么?

What I already tried: 我已经尝试过的

ab_df = ab_df[ab_df.bid != 'nan' and ab_df.ask != 'nan']

ab_df = ab_df[ab_df.bid.empty and ab_df.ask.empty] 

ab_df = ab_df[ab_df.bid.notnull and ab_df.ask.notnull]

But none of them work. 但是它们都不起作用。

You need vectorized logical operators & or | 您需要向量化逻辑运算符&| ( and and or from python are to compare scalars not for pandas Series), to check nan values, you can use isnull and notnull : 蟒蛇是比较标量不是熊猫系列),以检查NaN值,你可以使用isnullnotnull

To remove all rows where the bid column value is nan AND the ask column value is nan , keep the opposite: 删除出价列值为nan而要价列值为nan的所有行,请保持相反:

ab_df[ab_df.bid.notnull() | ab_df.ask.notnull()]

Example : 范例

df = pd.DataFrame({
        "bid": [pd.np.nan, 1, 2, pd.np.nan],
        "ask": [pd.np.nan, pd.np.nan, 2, 1]
    })

df[df.bid.notnull() | df.ask.notnull()]

#   ask bid
#1  NaN 1.0
#2  2.0 2.0
#3  1.0 NaN

If you need both columns to be non missing: 如果您需要两个列都不能缺少:

df[df.bid.notnull() & df.ask.notnull()]

#   ask bid
#2  2.0 2.0

Another option using dropna by setting the thresh parameter: 通过设置thresh参数使用dropna的另一种选择:

df.dropna(subset=['ask', 'bid'], thresh=1)

#   ask bid
#1  NaN 1.0
#2  2.0 2.0
#3  1.0 NaN

df.dropna(subset=['ask', 'bid'], thresh=2)

#   ask bid
#2  2.0 2.0
ab_df = ab_df.loc[~ab_df.bid.isnull() | ~ab_df.ask.isnull()]

all this time I've been usign that because i convinced myself that .notnull() didn't exist. 一直以来,我一直很谦虚,因为我说服自己.notnull()不存在。 TIL. 瓷砖。

ab_df = ab_df.loc[ab_df.bid.notnull() | ab_df.ask.notnull()]

The key is & rather than and and | 关键是&而不是and| rather than or 而不是or

I made a mistake earlier using & - this is wrong because you want either bid isn't null OR ask isn't null, using and would give you only the rows where both are not null. 我之前使用&犯了一个错误-这是错误的,因为您希望出价不为null或ask不为null,请使用并且只给您同时都不为null的行。

I think you can ab_df.dropna() as well, but i'll have to look it up 我认为您也可以使用ab_df.dropna() ,但我必须ab_df.dropna()一下

EDIT 编辑

oddly df.dropna() doesn't seem to support dropping based on NAs in a specific column. 奇怪的是df.dropna()似乎不支持基于特定列中的NA的删除。 I would have thought it did. 我会以为的。

based on the other answer I now see it does. 根据另一个答案,我现在看到了。 It's friday afternoon, ok? 今天是星期五下午,好吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM