简体   繁体   English

获取 pandas 行中最大值的列索引

[英]Get column index of max value in pandas row

I want to find not just the max value in a dataframe row, but also the specific column that has that value.我不仅想找到 dataframe 行中的最大值,还想找到具有该值的特定列。 If there are multiple columns with the value, then either returning the list of all columns, or just one, are both fine.如果有多个列的值,那么要么返回所有列的列表,要么只返回一个,都可以。

In this case, I'm specifically concerned with doing this for a single given row, but if there is a solution that can apply to a dataframe, that would be great as well.在这种情况下,我特别关心为单个给定行执行此操作,但如果有可以应用于 dataframe 的解决方案,那也很好。

Below is a rough idea of what I mean.以下是我的意思的粗略概念。 row.max() returns the max value, but my desired function row.max_col() returns the name of the column that has the max value. row.max()返回最大值,但我想要的 function row.max_col()返回具有最大值的列的名称。

>>> import pandas as pd
>>> df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6]})
>>> row = df.iloc[0]
>>> row.max()
4
>>> row.max_col()
Index(['B'], dtype='object')

My current approach is this:我目前的做法是这样的:

>>> row.index[row.eq(row.max())]
Index(['B'], dtype='object')

I'm not familiar with how pandas optimizes everything so I apologize if I'm wrong here, but I assume that row.index[row.eq(...)] grows in linear time proportional to the number of columns.我不熟悉 pandas 如何优化所有内容,所以如果我在这里错了,我深表歉意,但我假设row.index[row.eq(...)]的线性时间增长与列数成正比。 I'm working with a small number of columns, so it shouldn't be a huge issue, but I'm curious if there is a way to get the column name the same way that I can use .max() without having to do the extra work afterwards to look for equal values.我正在处理少量列,所以这应该不是一个大问题,但我很好奇是否有一种方法可以像使用.max()一样获得列名,而不必之后做额外的工作来寻找相等的值。

Use idxmax :使用idxmax

>>> df
   A  B
0  1  4
1  2  5
2  3  6

>>> df.iloc[0].idxmax()
'B'

Assume that the source DataFrame contains:假设源 DataFrame 包含:

   A  B
0  1  4
1  7  5
2  3  6
3  9  8

Then, to find the column name holding the max value in each row (not only row 0 ), run:然后,要查找行(不仅是第0行)中保存最大值的列名,请运行:

result = df.apply('idxmax', axis=1)

The result is:结果是:

0    B
1    A
2    B
3    A
dtype: object

But if you want to get the integer index of the column holding the max value, change the above code to:但是如果要获取保存最大值的列的integer索引,请将上面的代码更改为:

result = df.columns.get_indexer(df.apply('idxmax', axis=1))

This time the result is:这次的结果是:

array([1, 0, 1, 0], dtype=int64)
df1 = pd.DataFrame({
'Id': ['00', '01', '02', '02', '01', '03'] ,

'date': ['1990-12-31 ','1990-12-27 ','1990-12-28 ',
         '1990-12-28 ','1992-12-27 ','1990-12-30 '] , 
 
 'Population': ['700','200','300','400','500','100']        
         })
print(df1)

"""
   Id         date Population
0  00  1990-12-31         700
1  01  1990-12-27         200
2  02  1990-12-28         300
3  02  1990-12-28         400
4  01  1992-12-27         500
5  03  1990-12-30         100
"""



Max1 = df1.groupby('Id').apply( lambda df : df['Population'].values[df['Population'].values.argmax()]  )


print(Max1)

"""
Id
00    700
01    500
02    400
03    100
dtype: object
"""

Min1 = df1.groupby('Id').apply(lambda df : df['Population'].values[df['Population'].values.argmin()])

print(Min1)

"""
Id
00    700
01    200
02    300
03    100
dtype: object

"""

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM