从 Pandas dataframe 的每一行中获取前 N 个值及其各自的列名

Question

Index          Class 1               Class 2         Class 3         Class 4          Class 5  
0              0.95693475            0.252198994      0.0            0.335894585      0.611441553
1              0.473615974           0.0              0.510585248    0.5007305        0.975620011
2              0.224682823           0.122315248      0.6407305        0.0            0.872211390

这是我正在处理的 dataframe 的示例。 我原来200 Class's ，对于我的 dataframe 的每一行，我想找出前 3 个类别，它们的值按降序排列：

    Expected output:
    Row 0: [{Class 1 : 95693475}, {Class 5: 0.611441553}, {Class 4: 0.335894585}]
    Row 1: [{Class 5 : 0.975620011}, {Class 3: 0.510585248}, {Class 4: 0.5007305}]
etc etc...

注意：预期的output中的List和dict只是为了参考而添加的，只需要output中dataframe中每一行的前3个分数及其类别名称。任何人都可以帮我解决这个问题

Answer 1

参考：

将函数应用于 pandas 中的行 - 示例 3

从字典中返回前 n 个键值对

from itertools import islice
def take(n, iterable):
    return list(islice(iterable, n))

删除索引列

df.drop('Index', axis=1,inplace=True)

Function 可以应用于所有行以查找最上面的 3 个类别

topN function 将作为输入参数row ：这将是 dataframe 和n的一行：表示要提取的最顶层元素的数量。

def topN(row, n):
    x = row.to_dict() # convert the input row to a dictionary 
    x = {k: v for k, v in sorted(x.items(), key=lambda item: -item[1])} # sort the dictionary based on their values 
    n_items = take(n, x.items()) # extract the first n values from the dictionary 
    return n_items
n = 3 #number of elements needed
df['X'] = df.apply(lambda row : topN(row,n), axis = 1)

Output：

存储了一个新列X ，其中包含作为字典所需的结果。 您也可以将列转换为数组。

Class 1 Class 2 Class 3 Class 4 Class 5 X
0   0.956935    0.252199    0.000000    0.335895    0.611442    [(Class 1, 0.95693475), (Class 5 , 0.61144155...
1   0.473616    0.000000    0.510585    0.500731    0.975620    [(Class 5 , 0.975620011), (Class 3, 0.5105852...
2   0.224683    0.122315    0.640730    0.000000    0.872211    [(Class 5 , 0.87221139), (Class 3, 0.6407305)...

使用0.0删除所有值的示例：

d = {1:0.0, 2:0.0, 3:1.0}
x={k:v for k,v in d.items() if v}
x # prints {3: 1.0}

从 Pandas dataframe 的每一行中获取前 N 个值及其各自的列名

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-08-03 16:16:02

从 Pandas dataframe 的每一行中获取前 N 个值及其各自的列名

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-08-03 16:16:02

解决方案1
1 已采纳 2020-08-03 16:16:02