如果在組內不滿足任何條件，如何 select 所有行，如果在 pandas 中滿足組內的某些條件，如何 select 行的子集

Question

在dataframe下面：

pd.DataFrame({'customer': ['cust1', 'cust1', 'cust1', 'cust2', 'cust2', 'cust3', 'cust3', 'cust4', 'cust4'],
                   'year': [2017, 2018, 2019, 2018, 2019, 2017, 2018, 2018, 2019],
                   'score': [0.10, 0.59, 0.3, 0.44, 0.2, 0.78, 0.6, 0.37, .023]})

    customer    year    score
0   cust1   2017    0.100
1   cust1   2018    0.590
2   cust1   2019    0.300
3   cust2   2018    0.440
4   cust2   2019    0.200
5   cust3   2017    0.780
6   cust3   2018    0.600
7   cust4   2018    0.370
8   cust4   2019    0.023

我想過濾每組客戶中的數據。 條件是：

if the score >= 0.5: return only rows greater than 0.5 in that group
if no score is greater than 0.5 in a group: return all the rows

結果應如下所示：

    customer    year    cond
0   cust1   2018    0.590
1   cust2   2018    0.440
2   cust2   2019    0.200
3   cust3   2017    0.780
4   cust3   2018    0.600
5   cust4   2018    0.370
6   cust4   2019    0.023

Answer 1

鏈 2 條件 - 第一個掩碼用於測試是否大於或等於Series.ge ，第二個掩碼如果不匹配條件m則獲取所有customer ：

m = df['score'].ge(0.5)
df = df[m | ~df['customer'].isin(df.loc[m, 'customer'])]
print (df)
  customer  year  score
1    cust1  2018  0.590
3    cust2  2018  0.440
4    cust2  2019  0.200
5    cust3  2017  0.780
6    cust3  2018  0.600
7    cust4  2018  0.370
8    cust4  2019  0.023

詳情：

print (df.loc[m, 'customer'])
1    cust1
5    cust3
6    cust3
Name: customer, dtype: object

print (~df['customer'].isin(df.loc[m, 'customer']))
0    False
1    False
2    False
3     True
4     True
5    False
6    False
7     True
8     True
Name: customer, dtype: bool

或者，如果性能對第二個掩碼GroupBy.transform和GroupBy.any沒有重要用途 - 在大型數據幀中應該很慢：

m = df['score'].ge(0.5)
df = df[m | ~m.groupby(df['customer']).transform('any')]
print (df)
  customer  year  score
1    cust1  2018  0.590
3    cust2  2018  0.440
4    cust2  2019  0.200
5    cust3  2017  0.780
6    cust3  2018  0.600
7    cust4  2018  0.370
8    cust4  2019  0.023

詳情：

print (~m.groupby(df['customer']).transform('any'))
0    False
1    False
2    False
3     True
4     True
5    False
6    False
7     True
8     True
Name: score, dtype: bool

Answer 2

您可以為boolean 索引使用兩個掩碼：

# is the score ≥ 0.5?
m1 = df['score'].ge(0.5)
# are none of values in the group ≥ 0.5
m2 = ~m1.groupby(df['customer']).transform('any')

# select if any condition matches
out = df[m1|m2]

Output：

  customer  year  score
1    cust1  2018  0.590
3    cust2  2018  0.440
4    cust2  2019  0.200
5    cust3  2017  0.780
6    cust3  2018  0.600
7    cust4  2018  0.370
8    cust4  2019  0.023

如果在組內不滿足任何條件，如何 select 所有行，如果在 pandas 中滿足組內的某些條件，如何 select 行的子集

問題描述

2 個解決方案

解決方案1
3 已采納 2022-10-04 08:48:45

解決方案2
2 2022-10-04 08:51:15

如果在組內不滿足任何條件，如何 select 所有行，如果在 pandas 中滿足組內的某些條件，如何 select 行的子集

問題描述

2 個解決方案

解決方案1 3 已采納 2022-10-04 08:48:45

解決方案2 2 2022-10-04 08:51:15

解決方案1
3 已采納 2022-10-04 08:48:45

解決方案2
2 2022-10-04 08:51:15