Pandas 按多列分组以比较值

Question

My df looks like this: (There are dozens of other columns in the df but these are the three I am focused on)我的 df 看起来像这样：（df 中有几十个其他列，但这是我关注的三个）

Param    Value      Limit  
A        1.50       1
B        2.50       1
C        2.00       2
D        2.00       2.5
E        1.50       2

I am trying to use pandas to calculate how many [Value] that are less than [Limit] per [Param], Hoping to get a list like this:我正在尝试使用 pandas 来计算每个 [Param] 有多少 [Value] 小于 [Limit]，希望得到这样的列表：

Param    Count       
A        1
B        1       
C        1       
D        0       
E        0

I've tried with a few methods, the first being我试过几种方法，第一种是

value_count = df.loc[df['Value'] < df['Limit']].count() but this just gives the full count per column in the df. value_count = df.loc[df['Value'] < df['Limit']].count()但这只是给出了df中每列的完整计数。

I've also tried groupby function which I think could be the correct idea, by creating a subset of the df with the chosen columns我还尝试了 groupby 函数，我认为这可能是正确的想法，方法是使用所选列创建 df 的子集

df_below_limit = df[df['Value'] < df['Limit']]
df_below_limit.groupby('Param')['Value'].count()

This is nearly what I want but it excludes values below which I also need.这几乎是我想要的，但它不包括我也需要的值。 Not sure how to go about getting the list as I need it.不知道如何根据需要获取列表。

Answer 1

Assuming you want the count per Param, you can use:假设您想要每个参数的计数，您可以使用：

out = df['Value'].ge(df['Limit']).groupby(df['Param']).sum()

output:输出：

Param
A    1
B    2
C    1
D    0
E    0
dtype: int64

used input (with a duplicated row "B" for the example):使用的输入（例如重复行“B”）：

  Param  Value  Limit
0     A    1.5    1.0
1     B    2.5    1.0
2     B    2.5    1.0
3     C    2.0    2.0
4     D    2.0    2.5
5     E    1.5    2.0

as DataFrame作为数据框

df['Value'].ge(df['Limit']).groupby(df['Param']).sum().reset_index(name='Count')

# or

df['Value'].ge(df['Limit']).groupby(df['Param']).agg(Count='sum').reset_index()

output:输出：

  Param  Count
0     A      1
1     B      2
2     C      1
3     D      0
4     E      0

Pandas 按多列分组以比较值

问题描述

1 个解决方案

解决方案1
4 已采纳 2022-06-22 14:28:33

as DataFrame作为数据框

Pandas 按多列分组以比较值

问题描述

1 个解决方案

解决方案1 4 已采纳 2022-06-22 14:28:33

as DataFrame作为数据框

解决方案1
4 已采纳 2022-06-22 14:28:33