在pandas数据框中对行进行排序并获取列ID

Question

With a given pandas dataframe, I'd like to create new columns for the highest, second highest, third highest, etc... values in a row. 使用给定的pandas数据帧，我想为连续的最高，第二高，第三高等值创建新列。 And then create another column for the corresponding column name of each of those. 然后为每个列的相应列名创建另一列。 The code below does this for the max value of the row, but not those that follow. 下面的代码对行的最大值执行此操作，但不执行以下操作。

Adapted from Find the column name which has the maximum value for each row 改编自查找具有每行最大值的列名称

import pandas as pd

df = pd.DataFrame({'A': (23, 24, 55, 77, 33, 66),
                   'B': (12, 33, 0.2, 44, 23.5, 66),
                   'C': (1, 33, 66, 44, 5, 62),
                   'D': (9, 343, 4, 64, 24, 63),
                   'E': (123, 33, 2.2, 42, 2, 99)})

# Determine the max value and column name and add as columns to df
df['Max1'] = df.max(axis=1)
df['Col_Max1'] = df.idxmax(axis=1)

# Determine the 2nd and 3rd max PR and threshold levels and add as columns
# ???????????

print(df)

This produces: 这会产生：

    A     B   C    D      E   Max1 Col_Max1
0  23  12.0   1    9  123.0  123.0        E
1  24  33.0  33  343   33.0  343.0        D
2  55   0.2  66    4    2.2   66.0        C
3  77  44.0  44   64   42.0   77.0        A
4  33  23.5   5   24    2.0   33.0        A
5  66  66.0  62   63   99.0   99.0        E

Process finished with exit code 0

Only caveat would be that it is possible to have a very large number of columns, if that matters for performance. 唯一需要注意的是，如果对性能有影响，可能会有非常多的列。 Thanks guys. 多谢你们。

Answer 1

One approach using the underlying array data with focus on performance would be - 使用关注性能的底层阵列数据的一种方法是 -

a = df.values
c = df.columns
idx = a.argsort(1)[:,::-1]
vals = a[np.arange(idx.shape[0])[:,None], idx]
IDs = c[idx]

names_vals = ['Max'+str(i+1) for i in range(a.shape[1])]
names_IDs = ['Col_Max'+str(i+1) for i in range(a.shape[1])]

df_vals = pd.DataFrame(vals, columns=names_vals)
df_IDs = pd.DataFrame(IDs, columns=names_IDs)
df_out = pd.concat([df, df_vals, df_IDs], axis=1)

Sample input, output - 样本输入，输出 -

In [40]: df
Out[40]: 
    A     B   C    D      E
0  23  12.0   1    9  123.0
1  24  33.0  33  343   33.0
2  55   0.2  66    4    2.2
3  77  44.0  44   64   42.0
4  33  23.5   5   24    2.0
5  66  66.0  62   63   99.0

In [41]: df_out
Out[41]: 
    A     B   C    D      E   Max1  Max2  Max3  Max4  Max5 Col_Max1 Col_Max2  \
0  23  12.0   1    9  123.0  123.0  23.0  12.0   9.0   1.0        E        A   
1  24  33.0  33  343   33.0  343.0  33.0  33.0  33.0  24.0        D        E   
2  55   0.2  66    4    2.2   66.0  55.0   4.0   2.2   0.2        C        A   
3  77  44.0  44   64   42.0   77.0  64.0  44.0  44.0  42.0        A        D   
4  33  23.5   5   24    2.0   33.0  24.0  23.5   5.0   2.0        A        D   
5  66  66.0  62   63   99.0   99.0  66.0  66.0  63.0  62.0        E        B   

  Col_Max3 Col_Max4 Col_Max5  
0        B        D        C  
1        C        B        A  
2        D        E        B  
3        C        B        E  
4        B        C        E  
5        A        D        C

If you need the values and IDs in sequence, we need to modify the last few steps there - 如果您需要按顺序排列值和ID，我们需要修改其中的最后几个步骤 -

df0 = pd.DataFrame(np.dstack((vals, IDs)).reshape(a.shape[0],-1))
df0.columns = np.vstack((names_vals, names_IDs)).T.ravel()
df_out = pd.concat([df, df0], axis=1)

Sample output - 样品输出 -

In [62]: df_out
Out[62]: 
    A     B   C    D      E Max1 Col_Max1 Max2 Col_Max2  Max3 Col_Max3 Max4  \
0  23  12.0   1    9  123.0  123        E   23        A    12        B    9   
1  24  33.0  33  343   33.0  343        D   33        E    33        C   33   
2  55   0.2  66    4    2.2   66        C   55        A     4        D  2.2   
3  77  44.0  44   64   42.0   77        A   64        D    44        C   44   
4  33  23.5   5   24    2.0   33        A   24        D  23.5        B    5   
5  66  66.0  62   63   99.0   99        E   66        B    66        A   63   

  Col_Max4 Max5 Col_Max5  
0        D    1        C  
1        B   24        A  
2        E  0.2        B  
3        B   42        E  
4        C    2        E  
5        D   62        C

在pandas数据框中对行进行排序并获取列ID

问题描述

1 个解决方案

解决方案1
4 已采纳 2017-08-25 21:11:31

在pandas数据框中对行进行排序并获取列ID

问题描述

1 个解决方案

解决方案1 4 已采纳 2017-08-25 21:11:31

解决方案1
4 已采纳 2017-08-25 21:11:31