简体   繁体   中英

Get column index of max value in pandas row

I want to find not just the max value in a dataframe row, but also the specific column that has that value. If there are multiple columns with the value, then either returning the list of all columns, or just one, are both fine.

In this case, I'm specifically concerned with doing this for a single given row, but if there is a solution that can apply to a dataframe, that would be great as well.

Below is a rough idea of what I mean. row.max() returns the max value, but my desired function row.max_col() returns the name of the column that has the max value.

>>> import pandas as pd
>>> df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6]})
>>> row = df.iloc[0]
>>> row.max()
4
>>> row.max_col()
Index(['B'], dtype='object')

My current approach is this:

>>> row.index[row.eq(row.max())]
Index(['B'], dtype='object')

I'm not familiar with how pandas optimizes everything so I apologize if I'm wrong here, but I assume that row.index[row.eq(...)] grows in linear time proportional to the number of columns. I'm working with a small number of columns, so it shouldn't be a huge issue, but I'm curious if there is a way to get the column name the same way that I can use .max() without having to do the extra work afterwards to look for equal values.

Use idxmax :

>>> df
   A  B
0  1  4
1  2  5
2  3  6

>>> df.iloc[0].idxmax()
'B'

Assume that the source DataFrame contains:

   A  B
0  1  4
1  7  5
2  3  6
3  9  8

Then, to find the column name holding the max value in each row (not only row 0 ), run:

result = df.apply('idxmax', axis=1)

The result is:

0    B
1    A
2    B
3    A
dtype: object

But if you want to get the integer index of the column holding the max value, change the above code to:

result = df.columns.get_indexer(df.apply('idxmax', axis=1))

This time the result is:

array([1, 0, 1, 0], dtype=int64)
df1 = pd.DataFrame({
'Id': ['00', '01', '02', '02', '01', '03'] ,

'date': ['1990-12-31 ','1990-12-27 ','1990-12-28 ',
         '1990-12-28 ','1992-12-27 ','1990-12-30 '] , 
 
 'Population': ['700','200','300','400','500','100']        
         })
print(df1)

"""
   Id         date Population
0  00  1990-12-31         700
1  01  1990-12-27         200
2  02  1990-12-28         300
3  02  1990-12-28         400
4  01  1992-12-27         500
5  03  1990-12-30         100
"""



Max1 = df1.groupby('Id').apply( lambda df : df['Population'].values[df['Population'].values.argmax()]  )


print(Max1)

"""
Id
00    700
01    500
02    400
03    100
dtype: object
"""

Min1 = df1.groupby('Id').apply(lambda df : df['Population'].values[df['Population'].values.argmin()])

print(Min1)

"""
Id
00    700
01    200
02    300
03    100
dtype: object

"""

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM