[英]Python Pandas: How can I drop rows using df.drop and df.loc?
Suppose I have the following dataframe:假设我有以下 dataframe:
import numpy as np
import pandas as pd
df = pd.DataFrame(
{
'user': ['Adam', 'Barry', 'Cindy', 'Dirk', 'Ella'],
'income': [50000, 0, 100000, 30000, 0],
'net worth': [250000, 1000000, 2000000, 50000, 0]
}
)
So far, I've been removing rows based on conditions using the following:到目前为止,我一直在使用以下条件根据条件删除行:
df2 = df[df.income != 0]
And using multiple conditions like so:并像这样使用多个条件:
df3 = df[(df['income'] != 0) & (df['net worth'] > 100000)]
Question: Is this the preferred way to drop rows?问题:这是删除行的首选方式吗? If not, what is?
如果不是,那是什么? Is it possible to do this via
df.drop
and df.loc
?是否可以通过
df.drop
和df.loc
做到这一点? What would the syntax be?语法是什么?
.loc
creates a subset of the rows you want to keep rather than .drop
filter rows you want to remove. .loc
创建您要保留的行的子集,而不是.drop
过滤您要删除的行。 drop
need the row label (index name). drop
需要行 label(索引名称)。
The equivalent of your last filter with drop
is:最后一个带
drop
的过滤器的等价物是:
>>> df.drop(df[~((df['income'] != 0) & (df['net worth'] > 100000))].index)
user income net worth
0 Adam 50000 250000
2 Cindy 100000 2000000
# OR a bit smart:
>>> df.drop(df[(df['income'] == 0) | (df['net worth'] <= 100000)].index)
user income net worth
0 Adam 50000 250000
2 Cindy 100000 2000000
Which syntax do you prefer?您更喜欢哪种语法?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.