简体   繁体   English

Pandas - 如何按一个数字列分组并按每组的中位数过滤每组的行?

[英]Pandas - How can I group by one numeric column and filter rows from each group by the median of each group?

I have a dataset consisting of one ID, one categorical variable "A" and one numerical variable "B".我有一个由一个 ID、一个分类变量“A”和一个数值变量“B”组成的数据集。
I want to group by "A" and filter the rows from each group to get only the rows that are avobe or equal to the median of "B" (the median should be calculated for each group).我想按“A”分组并过滤每个组中的行,以仅获取 avobe 或等于“B”中位数的行(应为每个组计算中位数)。
Example:例子:

ID ID A一个 B
1 1 Category 1第一类 0.5 0.5
2 2 Category 2第 2 类 0.2 0.2
3 3 Category 1第一类 0.2 0.2
4 4 Category 1第一类 0.6 0.6
5 5 Category 2第 2 类 0.4 0.4

My expected result would be:我的预期结果是:

ID ID A一个 B
1 1 Category 1第一类 0.5 0.5
4 4 Category 1第一类 0.6 0.6
5 5 Category 2第 2 类 0.4 0.4

Being the median of category 1 = 0.5 and 0.3 for category 2.作为类别 1 的中位数 = 0.5 和类别 2 的 0.3。
Thank you!谢谢!

out = df[df.groupby("A")["B"].transform(lambda x: x >= x.median())]
print(out)

Prints:印刷:

   ID           A    B
0   1  Category 1  0.5
3   4  Category 1  0.6
4   5  Category 2  0.4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM