简体   繁体   English

如何在 pandas dataframe 中按组筛选行

[英]How to filter rows by group in a pandas dataframe

Suppose now I have some group data like假设现在我有一些组数据,比如

GroupID群号 ID ID Rank target目标
A一种 1 1个 1 1个 0 0
A一种 2 2个 3 3个 0 0
A一种 3 3个 2 2个 1 1个
B 1 1个 1 1个 0 0
B 2 2个 4 4个 0 0
B 3 3个 3 3个 1 1个
B 4 4个 2 2个 0 0
C C 1 1个 1 1个 1 1个
C C 2 2个 4 4个 0 0
C C 3 3个 3 3个 1 1个
C C 4 4个 2 2个 0 0
D 1 1个 1 1个 0 0
D 2 2个 4 4个 0 0
D 3 3个 3 3个 0 0
D 4 4个 2 2个 0 0

For each group,对于每个组,

  1. I want to filter the group which has no rows which target=1.我想过滤没有 target=1 行的组。

  2. Then I want to keep the row which target==1 and the rows which rank is higher than it.然后我想保留 target==1 的行和排名高于它的行。 Some group may have many rows which target==1, and we choose the one which rank is lower as our target.某些组可能有很多行目标== 1,我们选择排名较低的行作为我们的目标。 For example for group C, the ID=1 and ID=3 all have target==1, we will keep the rows which the rank<=3.例如对于组C,ID=1和ID=3都有target==1,我们将保留rank<=3的行。 So we will get所以我们会得到

GroupID群号 ID ID Rank target目标
A一种 1 1个 1 1个 0 0
A一种 3 3个 2 2个 1 1个
B 1 1个 1 1个 0 0
B 3 3个 3 3个 1 1个
B 4 4个 2 2个 0 0
C C 1 1个 1 1个 1 1个
C C 3 3个 3 3个 1 1个
C C 4 4个 2 2个 0 0

IIUC, make a first pass to slice the rows with target == 1 (using eq ), then get the max rank per group using GroupBy.max and select the rows with this maximum rank per group with classical boolean indexing using le : IIUC,首先通过 target == 1 对行进行切片(使用eq ),然后使用GroupBy.max获得每组的最大排名,并使用le使用经典的 boolean 索引获得每组具有此最大排名的行 select :

thresh = df[df['target'].eq(1)].groupby('GroupID')['Rank'].max()

out = df[df['Rank'].le(df['GroupID'].map(thresh))]

output: output:

   GroupID  ID  Rank  target
0        A   1     1       0
2        A   3     2       1
3        B   1     1       0
5        B   3     3       1
6        B   4     2       0
7        C   1     1       1
9        C   3     3       1
10       C   4     2       0

thresholds:阈值:

>>> thresh
GroupID
A    2
B    3
C    3

Replace Rank in Series.where if target is not 1 and then use GroupBy.transform for maximal Rank per group, so possible compare Rank column in boolean indexing by Series.le for less or equal:如果目标不是1 ,则替换Series.where中的Rank ,然后使用GroupBy.transform获取每组的最大Rank ,因此可以比较 boolean 中由Series.le boolean indexingRank列是否小于或等于:

s = df['Rank'].where(df['target'].eq(1)).groupby(df['GroupID']).transform('max')
df = df[df['Rank'].le(s)]
print (df)
   GroupID  ID  Rank  target
0        A   1     1       0
2        A   3     2       1
3        B   1     1       0
5        B   3     3       1
6        B   4     2       0
7        C   1     1       1
9        C   3     3       1
10       C   4     2       0

Details :详情

print (df['Rank'].where(df['target'].eq(1)))
0     NaN
1     NaN
2     2.0
3     NaN
4     NaN
5     3.0
6     NaN
7     1.0
8     NaN
9     3.0
10    NaN
Name: Rank, dtype: float64

print (s)
0     2.0
1     2.0
2     2.0
3     3.0
4     3.0
5     3.0
6     3.0
7     3.0
8     3.0
9     3.0
10    3.0
Name: Rank, dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM