[英]How to return the highest value from multiple columns to a new column in a pandas df
Apologies for the opaque question name (not sure how to word it).为不透明的问题名称道歉(不知道如何措辞)。 I have the following dataframe:我有以下 dataframe:
import pandas as pd
import numpy as np
data = [['tom', 1,1,6,4],
['tom', 1,2,2,3],
['tom', 1,2,3,1],
['tom', 2,3,2,7],
['jim', 1,4,3,6],
['jim', 2,6,5,3]]
df = pd.DataFrame(data, columns = ['Name', 'Day','A','B','C'])
df = df.groupby(by=['Name','Day']).agg('sum').reset_index()
df
I would like to add another column that returns text according to which column of A,B,C
is the highest:我想添加另一列,根据A,B,C
的哪一列最高返回文本:
For example I would like Apple
if A
is highest, Banana
if B
is highest, and Carrot
if C
is highest.例如,如果A
最高,我想要Apple
,如果B
最高,我想要Banana
,如果C
最高,我想要Carrot
。 So in the example above the values for the 4 columns should be:因此,在上面的示例中,4 列的值应该是:
New Col
Carrot
Apple
Banana
Carrot
Any help would be much appreciated!任何帮助将非常感激! Thanks谢谢
Use DataFrame.idxmax
along axis=1
with Series.map
:使用DataFrame.idxmax
沿axis=1
和Series.map
:
dct = {'A': 'Apple', 'B': 'Banana', 'C': 'Carrot'}
df['New col'] = df[['A', 'B', 'C']].idxmax(axis=1).map(dct)
Result:结果:
Name Day A B C New col
0 jim 1 4 3 6 Carrot
1 jim 2 6 5 3 Apple
2 tom 1 5 11 8 Banana
3 tom 2 3 2 7 Carrot
@ShubhamSharma's answer is better than this, but here is another option: @ShubhamSharma 的答案比这更好,但这是另一种选择:
df['New col'] = np.where((df['A'] > df['B']) & (df['A'] > df['C']), 'Apple', 'Carrot')
df['New col'] = np.where((df['B'] > df['A']) & (df['B'] > df['C']), 'Banana', df['New col'])
output: output:
Name Day A B C New col
0 jim 1 4 3 6 Carrot
1 jim 2 6 5 3 Apple
2 tom 1 5 11 8 Banana
3 tom 2 3 2 7 Carrot
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.