I'm looking for the dataframe column with the greatest value, and assign this variable name to a new variable. One similar example here does not answer that in a dataframe setting. See the example below:
import pandas as pd
data = {'A': [1, 2, 2, 0], 'B':[2, 0, 2, 1]}
df = pd.DataFrame(data)
I'm looking to create a variable df['C'] = [B, A, [A, B], B]
.
You can split it into several lines, but i guess that's it:
df["C"] = df.apply(lambda x: "A, B" if x.A == x.B == max(x.A, x.B) else "A" if x.A == max(x.A, x.B) else "B", axis=1)
this will give you
A B C
0 1 2 B
1 2 0 A
2 2 2 A, B
3 0 1 B
Use max
on the second axis and rework the dataframe to select the columns matching the max per row:
# get max value per row and identify matching cells
m = df.eq(df.max(axis=1), axis=0)
# mask and reshape to 1D (removes the non matches)
s = m.where(m).stack()
# aggregate to produce the final result
df['C'] = (s.index.get_level_values(1)
.to_series()
.groupby(s.index.get_level_values(0))
.apply(list)
)
Output:
A B C
0 1 2 [B]
1 2 0 [A]
2 2 2 [A, B]
3 0 1 [B]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.