[英]How to drop pandas dataframe rows based on conditions that consider other dataframe
[英]How to drop rows from pandas based on several numerical value conditions within subsets of each dataframe?
我有一个 df 看起来像这样:
Account 1 Pre 9
Account 1 Pre 9
Account 1 During 5
Account 1 Post 5
Account 1 Post 5
Account 2 Pre 11
Account 2 During 9
Account 2 Post 7
Account 2 Post 7
Account 2 Post 7
Account 2 Post 7
Account 3 Pre 1
Account 3 During 2
Account 3 During 2
Account 3 Post 3
如果前、期间和后都小于 10,我试图删除每个帐户的所有行。因此在上面的示例中,我们将丢失所有帐户 1 行和所有帐户 3 行,但保留所有帐户 2 行因为在一行中有 11 个。
我对 Pandas 和 python 比较陌生,但我认为遵循以下逻辑的方法可能会奏效:
for each Account in Account:
if 'Pre' > 10 AND 'During' > 10 AND 'Post' > 10
return (df_updated)
这个 df_updated 应该只由我认为的 Account 2 组成。 我不认为我可以只获取这个 for 循环的结果并直接返回一个新的 df ,所以我不太确定如何做到这一点。
感谢您提供任何帮助!
数据
print(df)
Account Status Count
0 Account1 Pre 9
1 Account1 Pre 9
2 Account1 During 5
3 Account1 Post 5
4 Account1 Post 5
5 Account2 Pre 11
6 Account2 During 9
7 Account2 Post 7
8 Account2 Post 7
9 Account2 Post 7
10 Account2 Post 7
11 Account3 Pre 1
12 Account3 During 2
13 Account3 During 2
14 Account3 Post 3
df[df.groupby('Account')['Count'].transform(lambda x: x.gt(10).any())]
Account Status Count
5 Account2 Pre 11
6 Account2 During 9
7 Account2 Post 7
8 Account2 Post 7
9 Account2 Post 7
10 Account2 Post 7
假设您的 df 有 3 列:
Accountname type value
Account 1 Pre 9
Account 1 Pre 9
Account 1 During 5
Account 1 Post 5
Account 1 Post 5
Account 2 Pre 11
Account 2 During 9
Account 2 Post 7
Account 2 Post 7
Account 2 Post 7
Account 2 Post 7
Account 3 Pre 1
Account 3 During 2
Account 3 During 2
Account 3 Post 3
您不需要如此复杂的脚本,您可以使用以下命令轻松过滤它:
df= df[lambda x: x['accountname'].isin(df[df['value']>10].accountname)]
output:
Account 2 Pre 11
Account 2 During 9
Account 2 Post 7
Account 2 Post 7
Account 2 Post 7
Account 2 Post 7
看起来第二列只有Pre
、 During
和Post
。 这意味着您只需要检查每个帐户是否有第三列值 > 10 的单行。第二列在您的问题中没有任何作用:
df.loc[df['col3'].gt(10).groupby(df['col1']).transform('any')]
输出:
account step value
4 Account 2 Pre 11
5 Account 2 During 9
6 Account 2 Post 7
7 Account 2 Post 7
8 Account 2 Post 7
9 Account 2 Post 7
您可以groupby.filter
至少有一个大于 10 的值的帐户
df.groupby('col1').filter(lambda x: x.col3.gt(10).any())
出去:
col1 col2 col3
5 Account 2 Pre 11
6 Account 2 During 9
7 Account 2 Post 7
8 Account 2 Post 7
9 Account 2 Post 7
10 Account 2 Post 7
设置数据框
import pandas as pd
import io
t = '''
Account 1 Pre 9
Account 1 Pre 9
Account 1 During 5
Account 1 Post 5
Account 1 Post 5
Account 2 Pre 11
Account 2 During 9
Account 2 Post 7
Account 2 Post 7
Account 2 Post 7
Account 2 Post 7
Account 3 Pre 1
Account 3 During 2
Account 3 During 2
Account 3 Post 3'''
df = pd.read_csv(io.StringIO(t), sep='\s\s+', engine='python', header=None, names=list('123')).add_prefix('col')
df
出去:
col1 col2 col3
0 Account 1 Pre 9
1 Account 1 Pre 9
2 Account 1 During 5
3 Account 1 Post 5
4 Account 1 Post 5
5 Account 2 Pre 11
6 Account 2 During 9
7 Account 2 Post 7
8 Account 2 Post 7
9 Account 2 Post 7
10 Account 2 Post 7
11 Account 3 Pre 1
12 Account 3 During 2
13 Account 3 During 2
14 Account 3 Post 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.