简体   繁体   中英

Python dataframe return column index of all max values as list

I'm looking for the dataframe column with the greatest value, and assign this variable name to a new variable. One similar example here does not answer that in a dataframe setting. See the example below:

import pandas as pd

data = {'A': [1, 2, 2, 0], 'B':[2, 0, 2, 1]}
df = pd.DataFrame(data)

I'm looking to create a variable df['C'] = [B, A, [A, B], B] .

You can split it into several lines, but i guess that's it:

df["C"] = df.apply(lambda x: "A, B" if x.A == x.B == max(x.A, x.B) else "A" if x.A == max(x.A, x.B) else "B", axis=1)

this will give you

   A  B     C
0  1  2     B
1  2  0     A
2  2  2  A, B
3  0  1     B

Use max on the second axis and rework the dataframe to select the columns matching the max per row:

# get max value per row and identify matching cells
m = df.eq(df.max(axis=1), axis=0)
# mask and reshape to 1D (removes the non matches)
s = m.where(m).stack()
# aggregate to produce the final result
df['C'] = (s.index.get_level_values(1)
            .to_series()
            .groupby(s.index.get_level_values(0))
            .apply(list)
           )

Output:

   A  B       C
0  1  2     [B]
1  2  0     [A]
2  2  2  [A, B]
3  0  1     [B]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM