简体   繁体   English

在pandas数据框中对行进行排序并获取列ID

[英]Sort rows and get column IDs in a pandas dataframe

With a given pandas dataframe, I'd like to create new columns for the highest, second highest, third highest, etc... values in a row. 使用给定的pandas数据帧,我想为连续的最高,第二高,第三高等值创建新列。 And then create another column for the corresponding column name of each of those. 然后为每个列的相应列名创建另一列。 The code below does this for the max value of the row, but not those that follow. 下面的代码对行的最大值执行此操作,但不执行以下操作。

Adapted from Find the column name which has the maximum value for each row 改编自查找具有每行最大值的列名称

import pandas as pd

df = pd.DataFrame({'A': (23, 24, 55, 77, 33, 66),
                   'B': (12, 33, 0.2, 44, 23.5, 66),
                   'C': (1, 33, 66, 44, 5, 62),
                   'D': (9, 343, 4, 64, 24, 63),
                   'E': (123, 33, 2.2, 42, 2, 99)})

# Determine the max value and column name and add as columns to df
df['Max1'] = df.max(axis=1)
df['Col_Max1'] = df.idxmax(axis=1)

# Determine the 2nd and 3rd max PR and threshold levels and add as columns
# ???????????

print(df)

This produces: 这会产生:

    A     B   C    D      E   Max1 Col_Max1
0  23  12.0   1    9  123.0  123.0        E
1  24  33.0  33  343   33.0  343.0        D
2  55   0.2  66    4    2.2   66.0        C
3  77  44.0  44   64   42.0   77.0        A
4  33  23.5   5   24    2.0   33.0        A
5  66  66.0  62   63   99.0   99.0        E

Process finished with exit code 0

Only caveat would be that it is possible to have a very large number of columns, if that matters for performance. 唯一需要注意的是,如果对性能有影响,可能会有非常多的列。 Thanks guys. 多谢你们。

One approach using the underlying array data with focus on performance would be - 使用关注性能的底层阵列数据的一种方法是 -

a = df.values
c = df.columns
idx = a.argsort(1)[:,::-1]
vals = a[np.arange(idx.shape[0])[:,None], idx]
IDs = c[idx]

names_vals = ['Max'+str(i+1) for i in range(a.shape[1])]
names_IDs = ['Col_Max'+str(i+1) for i in range(a.shape[1])]

df_vals = pd.DataFrame(vals, columns=names_vals)
df_IDs = pd.DataFrame(IDs, columns=names_IDs)
df_out = pd.concat([df, df_vals, df_IDs], axis=1)

Sample input, output - 样本输入,输出 -

In [40]: df
Out[40]: 
    A     B   C    D      E
0  23  12.0   1    9  123.0
1  24  33.0  33  343   33.0
2  55   0.2  66    4    2.2
3  77  44.0  44   64   42.0
4  33  23.5   5   24    2.0
5  66  66.0  62   63   99.0

In [41]: df_out
Out[41]: 
    A     B   C    D      E   Max1  Max2  Max3  Max4  Max5 Col_Max1 Col_Max2  \
0  23  12.0   1    9  123.0  123.0  23.0  12.0   9.0   1.0        E        A   
1  24  33.0  33  343   33.0  343.0  33.0  33.0  33.0  24.0        D        E   
2  55   0.2  66    4    2.2   66.0  55.0   4.0   2.2   0.2        C        A   
3  77  44.0  44   64   42.0   77.0  64.0  44.0  44.0  42.0        A        D   
4  33  23.5   5   24    2.0   33.0  24.0  23.5   5.0   2.0        A        D   
5  66  66.0  62   63   99.0   99.0  66.0  66.0  63.0  62.0        E        B   

  Col_Max3 Col_Max4 Col_Max5  
0        B        D        C  
1        C        B        A  
2        D        E        B  
3        C        B        E  
4        B        C        E  
5        A        D        C  

If you need the values and IDs in sequence, we need to modify the last few steps there - 如果您需要按顺序排列值和ID,我们需要修改其中的最后几个步骤 -

df0 = pd.DataFrame(np.dstack((vals, IDs)).reshape(a.shape[0],-1))
df0.columns = np.vstack((names_vals, names_IDs)).T.ravel()
df_out = pd.concat([df, df0], axis=1)

Sample output - 样品输出 -

In [62]: df_out
Out[62]: 
    A     B   C    D      E Max1 Col_Max1 Max2 Col_Max2  Max3 Col_Max3 Max4  \
0  23  12.0   1    9  123.0  123        E   23        A    12        B    9   
1  24  33.0  33  343   33.0  343        D   33        E    33        C   33   
2  55   0.2  66    4    2.2   66        C   55        A     4        D  2.2   
3  77  44.0  44   64   42.0   77        A   64        D    44        C   44   
4  33  23.5   5   24    2.0   33        A   24        D  23.5        B    5   
5  66  66.0  62   63   99.0   99        E   66        B    66        A   63   

  Col_Max4 Max5 Col_Max5  
0        D    1        C  
1        B   24        A  
2        E  0.2        B  
3        B   42        E  
4        C    2        E  
5        D   62        C  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM