简体   繁体   中英

Get the columns name of the two largest values from pandas dataframe rows

I have the following pandas data frame:

    AA    BB    CC    DD    EE
----------------------------------
0   1     12    4      3     5
1   5     7     28     7     4
2   9     7     9      2     6

I would like to add a new column ("MM") and set it to be a list of the column names of the two largest values in each row, for the above data frame, the output should be:

    AA    BB    CC    DD    EE    MM
-------------------------------------------------
0   1     12    4      3     5    ['BB','EE']
1   5     7     28     7     4    ['CC','DD','BB']
2   9     7     9      2     6    ['AA','CC']

in the first row, the two largest values are: 12,5 (column 'BB' and 'EE')

How can I do that?

Thanks

You can use apply with nlargest and the keep='all' parameter to keep the duplicates:

df['MM'] = df.apply(lambda r: r.nlargest(2, keep='all').index.values, axis=1)

output:

   AA  BB  CC  DD  EE            MM
0   1  12   4   3   5      [BB, EE]
1   5   7  28   7   4  [CC, BB, DD]
2   9   7   9   2   6      [AA, CC]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM