简体   繁体   中英

How to create a new columns with the top 3 maximum values in each row from specific columns in python df?

I have this df:

data = {
    'Name': ['Tom', 'nick', 'krish', 'jack'],
    'A': [20, 21, 19, 18],
    'B': [3, 6, 2, 1],
    'C': [6, 14, 5, 17],
    'D': [2, 10, 9, 98]
    
}
people = pd.DataFrame(data)

people["max_1"]=people[['A','B','C','D']].max(axis=1)
people

So I've added a new column - max_1 for the maximum value in each row from columns A, B, C, and D. My question is how can I create new columns (max_2 and max_3) for the 2nd highest value and for the third highest value?

Additional question - is it possible to add another condition on top of it? For example, find the maximum values but only when the names are 'Tom'/'nick'/'krish' -> otherwise, set 0 for those rows.

Thanks in advance.

A solution with apply and nlargest.

import pandas as pd

data = {
    'Name': ['Tom', 'nick', 'krish', 'jack'],
    'A': [20, 21, 19, 18],
    'B': [3, 6, 2, 1],
    'C': [6, 14, 5, 17],
    'D': [2, 10, 9, 98]
    
}
people = pd.DataFrame(data)

# Solution
# Set Name to index. So it does not interfere when we do things with numbers.
people = people.set_index("Name")

# To select specific columns
# columns = ["A", "C", "D"]
# people = people[columns]

# Apply nlargest to each row.
# Not efficient because we us apply. But the good part that there is not much code.
top3 = people.apply(lambda x: pd.Series(x.nlargest(3).values), axis=1)
people[["N1", "N2", "N3"]] = top3

Result

        A  B   C   D  N1  N2  N3
Name
Tom    20  3   6   2  20   6   3
nick   21  6  14  10  21  14  10
krish  19  2   5   9  19   9   5
jack   18  1  17  98  98  18  17
n = 3
idx = [f'max_{i}' for i in range(1, 1 + n)]
df = people.iloc[:, 1:].apply(lambda x: x.nlargest(n).set_axis(idx), axis=1)
people.join(df)

result:

    Name    A   B   C   D   max_1   max_2   max_3
0   Tom     20  3   6   2   20      6       3
1   nick    21  6   14  10  21      14      10
2   krish   19  2   5   9   19      9       5
3   jack    18  1   17  98  98      18      17

change n to what you want

Use:

#number of columns
N = 3

#columns names
cols = ['A','B','C','D']

#new columns names
new = [f'max_{i+1}' for i in range(N)]

#condition for test membership
mask = people['Name'].isin(['Tom','nick'])

#new columns filled 0
people[new] = 0
#for filtered rows get top N values
people.loc[mask, new] = np.sort(people.loc[mask, cols].to_numpy(), axis=1)[:, -N:][:, ::-1]

print (people)
    Name   A  B   C   D  max_1  max_2  max_3
0    Tom  20  3   6   2     20      6      3
1   nick  21  6  14  10     21     14     10
2  krish  19  2   5   9      0      0      0
3   jack  18  1  17  98      0      0      0

Soluton with numpy.where and broadcasting:

N = 3

cols = ['A','B','C','D']

new = [f'max_{i+1}' for i in range(N)]

mask = people['Name'].isin(['Tom','nick'])
people[new] = np.where(mask.to_numpy()[:, None], 
                       np.sort(people[cols].to_numpy(), axis=1)[:, -N:][:, ::-1], 
                       0)

print (people)
    Name   A  B   C   D  max_1  max_2  max_3
0    Tom  20  3   6   2     20      6      3
1   nick  21  6  14  10     21     14     10
2  krish  19  2   5   9      0      0      0
3   jack  18  1  17  98      0      0      0

You can do:

# to get max_2

people['max_2'] = [np.sort(people[['A','B','C','D']].iloc[:])[i][2] for i in range(len(people))] 

# to get max_3

people['max_3'] = [np.sort(people[['A','B','C','D']].iloc[:])[i][1] for i in range(len(people))] 

Use

import numpy as np

people[['max_1','max_2','max_3']] = \
people[['A','B','C','D']].apply(lambda x: -np.sort(-x), axis=1, raw=True).iloc[:, 0:3]
people
# Out: 
#     Name   A  B   C   D  max_1  max_2  max_3
# 0    Tom  20  3   3   2     20      3      3
# 1   nick  21  6  14  10     21     14     10
# 2  krish  19  2   5   9     19      9      5
# 3   jack  18  1  17  98     98     18     17

Note that I changed the data a bit to show what happens in case of duplicate values

# data = {
#     'Name': ['Tom', 'nick', 'krish', 'jack'],
#     'A': [20, 21, 19, 18],
#     'B': [3, 6, 2, 1],
#     'C': [3, 14, 5, 17],
#     'D': [2, 10, 9, 98]
#     }
# people = pd.DataFrame(data)
people
# Out: 
#     Name   A  B   C   D
# 0    Tom  20  3   3   2
# 1   nick  21  6  14  10
# 2  krish  19  2   5   9
# 3   jack  18  1  17  98

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM