简体   繁体   中英

Select Pandas dataframe row where two or more columns have their maximum value together

Suppose you have a pandas.DataFrame like so:

Institution Feat1 Feat2 Feat3 ...
ID1 14.5 0 0.32 ...
ID2 322.12 1 0.94 ...
ID3 27.08 0 1.47 ...

My question is simple: how would one select rows from this dataframe based on the maximum combined values from two or more columns. For example:

  • I want to select rows where the columns Feat1 and Feat3 have their maximum value together , returning:
Institution Feat1 Feat2 Feat3 ...
ID2 322.12 1 0.94 ...

I am certain a good old for loop can take care of the problem given a little time, but I believe there must be a Pandas function for that, hope someone point me in the right direction.

You can play arround with:

df.sum(axis=1)

df['row_sum'] = df.sum(axis=1)

or

df['sum'] = df['col1' ] + df['col3']

And then:

df.sort(['sum' ],ascending=[False or True])

df.sort_index()

You can do it with slicing:

output = df.loc[(df['Feat1'] + df['Feat3']).to_frame().idxmax(),:]

This outputs:

  Institution   Feat1  Feat2  Feat3
1         ID2  322.12      1   0.94

Alternatively you can always create a column and slice through it, but this would require a bit of an extra effort.

df['filter'] = df['Feat1'] + df['Feat3']
output = df[df['filter'] == df['filter'].max()]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM