[英]Group Pandas dataframe by two columns and output the maximum column value indication to new column
I have a Pandas dataframe, which I need to group by two different columns to check which value in a column is the highest.我有一个 Pandas dataframe,我需要将其按两个不同的列分组以检查列中的哪个值最高。 However, if the value in the first choice exists or is already the highest, there is no need to check the second subgroup.
但是,如果第一个选项中的值存在或已经是最高的,则无需检查第二个子组。 I have looked into Get the row(s) which have the max value in groups using groupby , but I have to make some additional checks, which I have not been able to do.
我已经查看了使用 groupby 获取组中具有最大值的行,但我必须进行一些额外的检查,这是我无法做到的。
Example:例子:
df = pd.DataFrame({
'First': ['KAT1', 'KAT1', 'KAT2', 'KAT3', 'KAT3', 'KAT4', 'KAT4', 'KAT4', 'KAT4'],
'Second': ['E', 'M', 'M', 'E', 'E', 'E', 'M', 'M', 'E'],
'Value': [20, 28, 25, 26, 24, 19, 23, 24, 25]
})
df
First Second Value
0 KAT1 E 20
1 KAT1 M 28
2 KAT2 M 25
3 KAT3 E 26
4 KAT3 E 24
5 KAT4 E 19
6 KAT4 M 23
7 KAT4 M 24
8 KAT4 E 25
First, it would need to group by column 'First' and then by 'Second' with preference to value 'E'.首先,它需要按“First”列分组,然后按“Second”分组,优先选择“E”值。 Then find the maximum value in that subgroup.
然后找到该子组中的最大值。 If 'E' does not exist, it would need to check 'M', and find the maximum value in that subgroup.
如果“E”不存在,则需要检查“M”,并在该子组中找到最大值。 Values can be tied, both would then be written to the new column as True.
可以绑定值,然后将两者作为 True 写入新列。
Expected output:预期 output:
First Second Value Ismax
0 KAT1 E 20 True
1 KAT1 M 28 False
2 KAT2 M 25 True
3 KAT3 E 26 True
4 KAT3 E 24 False
5 KAT4 E 19 False
6 KAT4 M 23 False
7 KAT4 M 24 False
8 KAT4 E 25 True
If only E
and M
values in Second
column you can use:如果
Second
列中只有E
和M
值,您可以使用:
#get E rows
m1 = df['Second'].eq('E')
#get groups with at least one E per First
m2 = df['First'].isin(df.loc[m1, 'First'])
#filter E groups if exist else M groups (like KAT2) and compare maximum
df['Ismax'] = df[(m1 & m2) | (~m1 & ~m2)]
.groupby('First')['Value'].transform('max').eq(df['Value'])
print (df)
First Second Value Ismax
0 KAT1 E 20 True
1 KAT1 M 28 False
2 KAT2 M 25 True
3 KAT3 E 26 True
4 KAT3 E 24 False
5 KAT4 E 19 False
6 KAT4 M 23 False
7 KAT4 M 24 False
8 KAT4 E 25 True
Details :详情:
print (df[(m1 & m2)])
First Second Value
0 KAT1 E 20
3 KAT3 E 26
4 KAT3 E 24
5 KAT4 E 19
8 KAT4 E 25
print (df[(~m1 & ~m2)])
First Second Value
2 KAT2 M 25
print (df[(m1 & m2) | (~m1 & ~m2)])
First Second Value
0 KAT1 E 20
2 KAT2 M 25
3 KAT3 E 26
4 KAT3 E 24
5 KAT4 E 19
8 KAT4 E 25
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.