如何在 pandas dataframe 中按组筛选行

Question

Suppose now I have some group data like假设现在我有一些组数据，比如

GroupID群号	ID ID	Rank秩	target目标
A一种	1 1个	1 1个	0 0
A一种	2 2个	3 3个	0 0
A一种	3 3个	2 2个	1 1个
B乙	1 1个	1 1个	0 0
B乙	2 2个	4 4个	0 0
B乙	3 3个	3 3个	1 1个
B乙	4 4个	2 2个	0 0
C C	1 1个	1 1个	1 1个
C C	2 2个	4 4个	0 0
C C	3 3个	3 3个	1 1个
C C	4 4个	2 2个	0 0
D丁	1 1个	1 1个	0 0
D丁	2 2个	4 4个	0 0
D丁	3 3个	3 3个	0 0
D丁	4 4个	2 2个	0 0

For each group,对于每个组，

I want to filter the group which has no rows which target=1.我想过滤没有 target=1 行的组。
Then I want to keep the row which target==1 and the rows which rank is higher than it.然后我想保留 target==1 的行和排名高于它的行。 Some group may have many rows which target==1, and we choose the one which rank is lower as our target.某些组可能有很多行目标== 1，我们选择排名较低的行作为我们的目标。 For example for group C, the ID=1 and ID=3 all have target==1, we will keep the rows which the rank<=3.例如对于组C，ID=1和ID=3都有target==1，我们将保留rank<=3的行。 So we will get所以我们会得到

GroupID群号	ID ID	Rank秩	target目标
A一种	1 1个	1 1个	0 0
A一种	3 3个	2 2个	1 1个
B乙	1 1个	1 1个	0 0
B乙	3 3个	3 3个	1 1个
B乙	4 4个	2 2个	0 0
C C	1 1个	1 1个	1 1个
C C	3 3个	3 3个	1 1个
C C	4 4个	2 2个	0 0

Answer 1

IIUC, make a first pass to slice the rows with target == 1 (using eq ), then get the max rank per group using GroupBy.max and select the rows with this maximum rank per group with classical boolean indexing using le : IIUC，首先通过 target == 1 对行进行切片（使用eq ），然后使用GroupBy.max获得每组的最大排名，并使用le使用经典的 boolean 索引获得每组具有此最大排名的行 select ：

thresh = df[df['target'].eq(1)].groupby('GroupID')['Rank'].max()

out = df[df['Rank'].le(df['GroupID'].map(thresh))]

output: output：

   GroupID  ID  Rank  target
0        A   1     1       0
2        A   3     2       1
3        B   1     1       0
5        B   3     3       1
6        B   4     2       0
7        C   1     1       1
9        C   3     3       1
10       C   4     2       0

thresholds:阈值：

>>> thresh
GroupID
A    2
B    3
C    3

Answer 2

Replace Rank in Series.where if target is not 1 and then use GroupBy.transform for maximal Rank per group, so possible compare Rank column in boolean indexing by Series.le for less or equal:如果目标不是1 ，则替换Series.where中的Rank ，然后使用GroupBy.transform获取每组的最大Rank ，因此可以比较 boolean 中由Series.le boolean indexing的Rank列是否小于或等于：

s = df['Rank'].where(df['target'].eq(1)).groupby(df['GroupID']).transform('max')
df = df[df['Rank'].le(s)]
print (df)
   GroupID  ID  Rank  target
0        A   1     1       0
2        A   3     2       1
3        B   1     1       0
5        B   3     3       1
6        B   4     2       0
7        C   1     1       1
9        C   3     3       1
10       C   4     2       0

Details :详情：

print (df['Rank'].where(df['target'].eq(1)))
0     NaN
1     NaN
2     2.0
3     NaN
4     NaN
5     3.0
6     NaN
7     1.0
8     NaN
9     3.0
10    NaN
Name: Rank, dtype: float64

print (s)
0     2.0
1     2.0
2     2.0
3     3.0
4     3.0
5     3.0
6     3.0
7     3.0
8     3.0
9     3.0
10    3.0
Name: Rank, dtype: float64

如何在 pandas dataframe 中按组筛选行

问题描述

2 个解决方案

解决方案1
3 2022-03-02 08:45:25

解决方案2
3 已采纳 2022-03-02 08:46:25

如何在 pandas dataframe 中按组筛选行

问题描述

2 个解决方案

解决方案1 3 2022-03-02 08:45:25

解决方案2 3 已采纳 2022-03-02 08:46:25

解决方案1
3 2022-03-02 08:45:25

解决方案2
3 已采纳 2022-03-02 08:46:25