![](/img/trans.png)
[英]How to select all rows of group if one row within group meets certain condition in pandas
[英]How to select all rows if no conditions are met within a group and select a subset of rows if certain conditions within a group are met in pandas
在dataframe下面:
pd.DataFrame({'customer': ['cust1', 'cust1', 'cust1', 'cust2', 'cust2', 'cust3', 'cust3', 'cust4', 'cust4'],
'year': [2017, 2018, 2019, 2018, 2019, 2017, 2018, 2018, 2019],
'score': [0.10, 0.59, 0.3, 0.44, 0.2, 0.78, 0.6, 0.37, .023]})
customer year score
0 cust1 2017 0.100
1 cust1 2018 0.590
2 cust1 2019 0.300
3 cust2 2018 0.440
4 cust2 2019 0.200
5 cust3 2017 0.780
6 cust3 2018 0.600
7 cust4 2018 0.370
8 cust4 2019 0.023
我想過濾每組客戶中的數據。 條件是:
if the score >= 0.5: return only rows greater than 0.5 in that group
if no score is greater than 0.5 in a group: return all the rows
結果應如下所示:
customer year cond
0 cust1 2018 0.590
1 cust2 2018 0.440
2 cust2 2019 0.200
3 cust3 2017 0.780
4 cust3 2018 0.600
5 cust4 2018 0.370
6 cust4 2019 0.023
鏈 2 條件 - 第一個掩碼用於測試是否大於或等於Series.ge
,第二個掩碼如果不匹配條件m
則獲取所有customer
:
m = df['score'].ge(0.5)
df = df[m | ~df['customer'].isin(df.loc[m, 'customer'])]
print (df)
customer year score
1 cust1 2018 0.590
3 cust2 2018 0.440
4 cust2 2019 0.200
5 cust3 2017 0.780
6 cust3 2018 0.600
7 cust4 2018 0.370
8 cust4 2019 0.023
詳情:
print (df.loc[m, 'customer'])
1 cust1
5 cust3
6 cust3
Name: customer, dtype: object
print (~df['customer'].isin(df.loc[m, 'customer']))
0 False
1 False
2 False
3 True
4 True
5 False
6 False
7 True
8 True
Name: customer, dtype: bool
或者,如果性能對第二個掩碼GroupBy.transform
和GroupBy.any
沒有重要用途 - 在大型數據幀中應該很慢:
m = df['score'].ge(0.5)
df = df[m | ~m.groupby(df['customer']).transform('any')]
print (df)
customer year score
1 cust1 2018 0.590
3 cust2 2018 0.440
4 cust2 2019 0.200
5 cust3 2017 0.780
6 cust3 2018 0.600
7 cust4 2018 0.370
8 cust4 2019 0.023
詳情:
print (~m.groupby(df['customer']).transform('any'))
0 False
1 False
2 False
3 True
4 True
5 False
6 False
7 True
8 True
Name: score, dtype: bool
您可以為boolean 索引使用兩個掩碼:
# is the score ≥ 0.5?
m1 = df['score'].ge(0.5)
# are none of values in the group ≥ 0.5
m2 = ~m1.groupby(df['customer']).transform('any')
# select if any condition matches
out = df[m1|m2]
Output:
customer year score
1 cust1 2018 0.590
3 cust2 2018 0.440
4 cust2 2019 0.200
5 cust3 2017 0.780
6 cust3 2018 0.600
7 cust4 2018 0.370
8 cust4 2019 0.023
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.