将 Pandas dataframe 分组为两列， output 将最大列值指示到新列

Question

I have a Pandas dataframe, which I need to group by two different columns to check which value in a column is the highest.我有一个 Pandas dataframe，我需要将其按两个不同的列分组以检查列中的哪个值最高。 However, if the value in the first choice exists or is already the highest, there is no need to check the second subgroup.但是，如果第一个选项中的值存在或已经是最高的，则无需检查第二个子组。 I have looked into Get the row(s) which have the max value in groups using groupby , but I have to make some additional checks, which I have not been able to do.我已经查看了使用 groupby 获取组中具有最大值的行，但我必须进行一些额外的检查，这是我无法做到的。

Example:例子：

df = pd.DataFrame({
    'First': ['KAT1', 'KAT1', 'KAT2', 'KAT3', 'KAT3', 'KAT4', 'KAT4', 'KAT4', 'KAT4'],
    'Second': ['E', 'M', 'M', 'E', 'E', 'E', 'M', 'M', 'E'],
    'Value': [20, 28, 25, 26, 24, 19, 23, 24, 25]
})


df
  First Second  Value
0  KAT1      E     20
1  KAT1      M     28
2  KAT2      M     25
3  KAT3      E     26
4  KAT3      E     24
5  KAT4      E     19
6  KAT4      M     23
7  KAT4      M     24
8  KAT4      E     25

First, it would need to group by column 'First' and then by 'Second' with preference to value 'E'.首先，它需要按“First”列分组，然后按“Second”分组，优先选择“E”值。 Then find the maximum value in that subgroup.然后找到该子组中的最大值。 If 'E' does not exist, it would need to check 'M', and find the maximum value in that subgroup.如果“E”不存在，则需要检查“M”，并在该子组中找到最大值。 Values can be tied, both would then be written to the new column as True.可以绑定值，然后将两者作为 True 写入新列。

Expected output:预期 output：

  First Second  Value  Ismax
0  KAT1      E     20   True
1  KAT1      M     28  False
2  KAT2      M     25   True
3  KAT3      E     26   True
4  KAT3      E     24  False
5  KAT4      E     19  False
6  KAT4      M     23  False
7  KAT4      M     24  False
8  KAT4      E     25   True

Answer 1

If only E and M values in Second column you can use:如果Second列中只有E和M值，您可以使用：

#get E rows
m1 = df['Second'].eq('E')
#get groups with at least one E per First
m2 = df['First'].isin(df.loc[m1, 'First'])
#filter E groups if exist else M groups (like KAT2) and compare maximum 
df['Ismax'] = df[(m1 & m2) | (~m1 & ~m2)]
                        .groupby('First')['Value'].transform('max').eq(df['Value'])

print (df)
  First Second  Value  Ismax
0  KAT1      E     20   True
1  KAT1      M     28  False
2  KAT2      M     25   True
3  KAT3      E     26   True
4  KAT3      E     24  False
5  KAT4      E     19  False
6  KAT4      M     23  False
7  KAT4      M     24  False
8  KAT4      E     25   True

Details :详情：

print (df[(m1 & m2)])
  First Second  Value
0  KAT1      E     20
3  KAT3      E     26
4  KAT3      E     24
5  KAT4      E     19
8  KAT4      E     25

print (df[(~m1 & ~m2)])
  First Second  Value
2  KAT2      M     25

print (df[(m1 & m2) | (~m1 & ~m2)])
  First Second  Value
0  KAT1      E     20
2  KAT2      M     25
3  KAT3      E     26
4  KAT3      E     24
5  KAT4      E     19
8  KAT4      E     25

将 Pandas dataframe 分组为两列， output 将最大列值指示到新列

问题描述

1 个解决方案

解决方案1
-1 2022-01-11 11:02:45

将 Pandas dataframe 分组为两列， output 将最大列值指示到新列

问题描述

1 个解决方案

解决方案1 -1 2022-01-11 11:02:45

解决方案1
-1 2022-01-11 11:02:45