[英]Pandas groupby multiple columns to compare values
My df looks like this: (There are dozens of other columns in the df but these are the three I am focused on)我的 df 看起来像这样:(df 中有几十个其他列,但这是我关注的三个)
Param Value Limit
A 1.50 1
B 2.50 1
C 2.00 2
D 2.00 2.5
E 1.50 2
I am trying to use pandas to calculate how many [Value] that are less than [Limit] per [Param], Hoping to get a list like this:我正在尝试使用 pandas 来计算每个 [Param] 有多少 [Value] 小于 [Limit],希望得到这样的列表:
Param Count
A 1
B 1
C 1
D 0
E 0
I've tried with a few methods, the first being我试过几种方法,第一种是
value_count = df.loc[df['Value'] < df['Limit']].count()
but this just gives the full count per column in the df. value_count = df.loc[df['Value'] < df['Limit']].count()
但这只是给出了df中每列的完整计数。
I've also tried groupby function which I think could be the correct idea, by creating a subset of the df with the chosen columns我还尝试了 groupby 函数,我认为这可能是正确的想法,方法是使用所选列创建 df 的子集
df_below_limit = df[df['Value'] < df['Limit']]
df_below_limit.groupby('Param')['Value'].count()
This is nearly what I want but it excludes values below which I also need.这几乎是我想要的,但它不包括我也需要的值。 Not sure how to go about getting the list as I need it.不知道如何根据需要获取列表。
Assuming you want the count per Param, you can use:假设您想要每个参数的计数,您可以使用:
out = df['Value'].ge(df['Limit']).groupby(df['Param']).sum()
output:输出:
Param
A 1
B 2
C 1
D 0
E 0
dtype: int64
used input (with a duplicated row "B" for the example):使用的输入(例如重复行“B”):
Param Value Limit
0 A 1.5 1.0
1 B 2.5 1.0
2 B 2.5 1.0
3 C 2.0 2.0
4 D 2.0 2.5
5 E 1.5 2.0
df['Value'].ge(df['Limit']).groupby(df['Param']).sum().reset_index(name='Count')
# or
df['Value'].ge(df['Limit']).groupby(df['Param']).agg(Count='sum').reset_index()
output:输出:
Param Count
0 A 1
1 B 2
2 C 1
3 D 0
4 E 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.