[英]Deleting a row in pandas dataframe based on condition
Scenario: I have a dataframe with some nan scattered around. 场景:我有一个数据框,周围散布着一些nan。 It has multiple columns, the ones of interest are "bid" and "ask" 它有多个列,感兴趣的是“出价”和“询问”
What I want to do: I want to remove all rows where the bid column value is nan AND the ask column value is nan. 我想做什么:我想删除所有出价列值为nan而要价列为nan的行。
Question: What is the best way to do it? 问题:最佳方法是什么?
What I already tried: 我已经尝试过的
ab_df = ab_df[ab_df.bid != 'nan' and ab_df.ask != 'nan']
ab_df = ab_df[ab_df.bid.empty and ab_df.ask.empty]
ab_df = ab_df[ab_df.bid.notnull and ab_df.ask.notnull]
But none of them work. 但是它们都不起作用。
You need vectorized logical operators &
or |
您需要向量化逻辑运算符&
或|
( and and or from python are to compare scalars not for pandas Series), to check nan values, you can use isnull
and notnull
: ( 和和或蟒蛇是比较标量不是熊猫系列),以检查NaN值,你可以使用isnull
和notnull
:
To remove all rows where the bid column value is nan AND the ask column value is nan , keep the opposite: 要删除出价列值为nan而要价列值为nan的所有行,请保持相反:
ab_df[ab_df.bid.notnull() | ab_df.ask.notnull()]
Example : 范例 :
df = pd.DataFrame({
"bid": [pd.np.nan, 1, 2, pd.np.nan],
"ask": [pd.np.nan, pd.np.nan, 2, 1]
})
df[df.bid.notnull() | df.ask.notnull()]
# ask bid
#1 NaN 1.0
#2 2.0 2.0
#3 1.0 NaN
If you need both columns to be non missing: 如果您需要两个列都不能缺少:
df[df.bid.notnull() & df.ask.notnull()]
# ask bid
#2 2.0 2.0
Another option using dropna
by setting the thresh parameter: 通过设置thresh参数使用dropna
的另一种选择:
df.dropna(subset=['ask', 'bid'], thresh=1)
# ask bid
#1 NaN 1.0
#2 2.0 2.0
#3 1.0 NaN
df.dropna(subset=['ask', 'bid'], thresh=2)
# ask bid
#2 2.0 2.0
ab_df = ab_df.loc[~ab_df.bid.isnull() | ~ab_df.ask.isnull()]
all this time I've been usign that because i convinced myself that .notnull()
didn't exist. 一直以来,我一直很谦虚,因为我说服自己.notnull()
不存在。 TIL. 瓷砖。
ab_df = ab_df.loc[ab_df.bid.notnull() | ab_df.ask.notnull()]
The key is &
rather than and
and |
关键是&
而不是and
和|
rather than or
而不是or
I made a mistake earlier using &
- this is wrong because you want either bid isn't null OR ask isn't null, using and would give you only the rows where both are not null. 我之前使用&
犯了一个错误-这是错误的,因为您希望出价不为null或ask不为null,请使用并且只给您同时都不为null的行。
I think you can ab_df.dropna()
as well, but i'll have to look it up 我认为您也可以使用ab_df.dropna()
,但我必须ab_df.dropna()
一下
EDIT 编辑
oddly df.dropna()
doesn't seem to support dropping based on NAs in a specific column. 奇怪的是df.dropna()
似乎不支持基于特定列中的NA的删除。 I would have thought it did. 我会以为的。
based on the other answer I now see it does. 根据另一个答案,我现在看到了。 It's friday afternoon, ok? 今天是星期五下午,好吗?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.