繁体   English   中英

如何根据每个数据帧子集中的几个数值条件从熊猫中删除行?

[英]How to drop rows from pandas based on several numerical value conditions within subsets of each dataframe?

我有一个 df 看起来像这样:

Account 1   Pre       9
Account 1   Pre       9
Account 1   During    5
Account 1   Post      5
Account 1   Post      5
Account 2   Pre       11
Account 2   During    9
Account 2   Post      7
Account 2   Post      7
Account 2   Post      7
Account 2   Post      7
Account 3   Pre       1
Account 3   During    2
Account 3   During    2
Account 3   Post      3

如果前、期间和后都小于 10,我试图删除每个帐户的所有行。因此在上面的示例中,我们将丢失所有帐户 1 行和所有帐户 3 行,但保留所有帐户 2 行因为在一行中有 11 个。

我对 Pandas 和 python 比较陌生,但我认为遵循以下逻辑的方法可能会奏效:

for each Account in Account:
    if 'Pre' > 10 AND 'During' > 10 AND 'Post' > 10
    return (df_updated)

这个 df_updated 应该只由我认为的 Account 2 组成。 我不认为我可以只获取这个 for 循环的结果并直接返回一个新的 df ,所以我不太确定如何做到这一点。

感谢您提供任何帮助!

数据

print(df)

 Account  Status  Count
0   Account1     Pre      9
1   Account1     Pre      9
2   Account1  During      5
3   Account1    Post      5
4   Account1    Post      5
5   Account2     Pre     11
6   Account2  During      9
7   Account2    Post      7
8   Account2    Post      7
9   Account2    Post      7
10  Account2    Post      7
11  Account3     Pre      1
12  Account3  During      2
13  Account3  During      2
14  Account3    Post      3



df[df.groupby('Account')['Count'].transform(lambda x: x.gt(10).any())]



 Account  Status  Count
5   Account2     Pre     11
6   Account2  During      9
7   Account2    Post      7
8   Account2    Post      7
9   Account2    Post      7
10  Account2    Post      7

假设您的 df 有 3 列:

Accountname type      value
Account 1   Pre       9
Account 1   Pre       9
Account 1   During    5
Account 1   Post      5
Account 1   Post      5
Account 2   Pre       11
Account 2   During    9
Account 2   Post      7
Account 2   Post      7
Account 2   Post      7
Account 2   Post      7
Account 3   Pre       1
Account 3   During    2
Account 3   During    2
Account 3   Post      3   

您不需要如此复杂的脚本,您可以使用以下命令轻松过滤它:

df= df[lambda x: x['accountname'].isin(df[df['value']>10].accountname)]

output:

Account 2   Pre       11
Account 2   During    9
Account 2   Post      7
Account 2   Post      7
Account 2   Post      7
Account 2   Post      7

看起来第二列只有PreDuringPost 这意味着您只需要检查每个帐户是否有第三列值 > 10 的单行。第二列在您的问题中没有任何作用:

df.loc[df['col3'].gt(10).groupby(df['col1']).transform('any')]

输出:

     account    step  value
4  Account 2     Pre     11
5  Account 2  During      9
6  Account 2    Post      7
7  Account 2    Post      7
8  Account 2    Post      7
9  Account 2    Post      7

您可以groupby.filter至少有一个大于 10 的值的帐户

df.groupby('col1').filter(lambda x: x.col3.gt(10).any())

出去:

         col1    col2  col3
5   Account 2     Pre    11
6   Account 2  During     9
7   Account 2    Post     7
8   Account 2    Post     7
9   Account 2    Post     7
10  Account 2    Post     7

设置数据框

import pandas as pd
import io

t = '''
Account 1   Pre       9
Account 1   Pre       9
Account 1   During    5
Account 1   Post      5
Account 1   Post      5
Account 2   Pre       11
Account 2   During    9
Account 2   Post      7
Account 2   Post      7
Account 2   Post      7
Account 2   Post      7
Account 3   Pre       1
Account 3   During    2
Account 3   During    2
Account 3   Post      3'''

df = pd.read_csv(io.StringIO(t), sep='\s\s+', engine='python', header=None, names=list('123')).add_prefix('col')
df

出去:

         col1    col2  col3
0   Account 1     Pre     9
1   Account 1     Pre     9
2   Account 1  During     5
3   Account 1    Post     5
4   Account 1    Post     5
5   Account 2     Pre    11
6   Account 2  During     9
7   Account 2    Post     7
8   Account 2    Post     7
9   Account 2    Post     7
10  Account 2    Post     7
11  Account 3     Pre     1
12  Account 3  During     2
13  Account 3  During     2
14  Account 3    Post     3

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM