简体   繁体   English

将 Pandas dataframe 分组为两列, output 将最大列值指示到新列

[英]Group Pandas dataframe by two columns and output the maximum column value indication to new column

I have a Pandas dataframe, which I need to group by two different columns to check which value in a column is the highest.我有一个 Pandas dataframe,我需要将其按两个不同的列分组以检查列中的哪个值最高。 However, if the value in the first choice exists or is already the highest, there is no need to check the second subgroup.但是,如果第一个选项中的值存在或已经是最高的,则无需检查第二个子组。 I have looked into Get the row(s) which have the max value in groups using groupby , but I have to make some additional checks, which I have not been able to do.我已经查看了使用 groupby 获取组中具有最大值的行,但我必须进行一些额外的检查,这是我无法做到的。

Example:例子:

df = pd.DataFrame({
    'First': ['KAT1', 'KAT1', 'KAT2', 'KAT3', 'KAT3', 'KAT4', 'KAT4', 'KAT4', 'KAT4'],
    'Second': ['E', 'M', 'M', 'E', 'E', 'E', 'M', 'M', 'E'],
    'Value': [20, 28, 25, 26, 24, 19, 23, 24, 25]
})


df
  First Second  Value
0  KAT1      E     20
1  KAT1      M     28
2  KAT2      M     25
3  KAT3      E     26
4  KAT3      E     24
5  KAT4      E     19
6  KAT4      M     23
7  KAT4      M     24
8  KAT4      E     25

First, it would need to group by column 'First' and then by 'Second' with preference to value 'E'.首先,它需要按“First”列分组,然后按“Second”分组,优先选择“E”值。 Then find the maximum value in that subgroup.然后找到该子组中的最大值。 If 'E' does not exist, it would need to check 'M', and find the maximum value in that subgroup.如果“E”不存在,则需要检查“M”,并在该子组中找到最大值。 Values can be tied, both would then be written to the new column as True.可以绑定值,然后将两者作为 True 写入新列。

Expected output:预期 output:

  First Second  Value  Ismax
0  KAT1      E     20   True
1  KAT1      M     28  False
2  KAT2      M     25   True
3  KAT3      E     26   True
4  KAT3      E     24  False
5  KAT4      E     19  False
6  KAT4      M     23  False
7  KAT4      M     24  False
8  KAT4      E     25   True

If only E and M values in Second column you can use:如果Second列中只有EM值,您可以使用:

#get E rows
m1 = df['Second'].eq('E')
#get groups with at least one E per First
m2 = df['First'].isin(df.loc[m1, 'First'])
#filter E groups if exist else M groups (like KAT2) and compare maximum 
df['Ismax'] = df[(m1 & m2) | (~m1 & ~m2)]
                        .groupby('First')['Value'].transform('max').eq(df['Value'])

print (df)
  First Second  Value  Ismax
0  KAT1      E     20   True
1  KAT1      M     28  False
2  KAT2      M     25   True
3  KAT3      E     26   True
4  KAT3      E     24  False
5  KAT4      E     19  False
6  KAT4      M     23  False
7  KAT4      M     24  False
8  KAT4      E     25   True

Details :详情

print (df[(m1 & m2)])
  First Second  Value
0  KAT1      E     20
3  KAT3      E     26
4  KAT3      E     24
5  KAT4      E     19
8  KAT4      E     25

print (df[(~m1 & ~m2)])
  First Second  Value
2  KAT2      M     25

print (df[(m1 & m2) | (~m1 & ~m2)])
  First Second  Value
0  KAT1      E     20
2  KAT2      M     25
3  KAT3      E     26
4  KAT3      E     24
5  KAT4      E     19
8  KAT4      E     25

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何 select pandas 行在一个列中具有最大值,来自一组共享两个公共列的行? - How to select pandas row with maximum value in one column, from a group of rows that share two common columns? 如何将两个方程的最大值输入 pandas dataframe 列? - How to input maximum value of two equations into pandas dataframe column? Pandas 比较同一数据框中两列中的字符串,并有条件地输出到新列 - Pandas compare strings in two columns within the same dataframe with conditional output to new column 使用Pandas DataFrame,如何按多列分组并添加新列 - Using pandas dataframe, how to group by multiple columns and adding new column 如何在 Pandas 中将两列分组并将另外两列相乘成一个新列? - How to group two columns and multiply other two into a new column in Pandas? 熊猫数据框:按两列分组,然后对另一列取平均值 - Pandas dataframe: Group by two columns and then average over another column 熊猫数据框:按两列分组,然后对第三列取平均值 - Pandas dataframe: Group by two columns and then average the third column Pandas 按两列分组,并按每组计算第二列值 - Pandas group by two columns and count the second column value by each group Pandas DataFrame检查列值是否存在列值 - Pandas DataFrame check if column value exists in a group of columns 返回数据框中两列的最大值(Pandas) - Return the maximum value of two columns in a dataframe (Pandas)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM