简体   繁体   English

从Numpy数组中获取Pandas Dataframe列名称

[英]Get Pandas Dataframe Column names from Numpy Array

I have a dataframe imported from excel: 我有一个从excel导入的数据框:

>>df

    Name Emp ID  Total Salary     A      B     C     D      E
0   Mike   A001         25000  5000  15000  3000     0   2000
1   John   A002         23000  5000  10000  3000  3000   2000
2    Bob   A003         21000  5000  15000     0  1000      0
3   Rose   A004         20000  5000  10000  2000  1000  20000
4  James   A005         10000  5000      0  3000     0   2000

Now I have calculated the sum of subset of Total Salary using the following code: 现在,我使用以下代码计算了总工资的子集总和:

Code: 码:

import pandas as pd
import numpy as np

df = pd.read_excel('tmp/test.xlsx')
val = df.drop(['Name','Emp ID','Total Salary'],1)
test = np.array(val)

num = df['Total Salary'][0]
array = test[0]

def subsetsum(array,num):
    if num == 0 or num < 1:
        return None
    elif len(array) == 0:
        return None
    else:
        if np.isclose(array[0],num):
            return [array[0]]
    else:
        with_v = subsetsum(array[1:],(num - array[0])) 
        if with_v:
            return [array[0]] + with_v
        else:
            return subsetsum(array[1:],num)

print('\nValues : ',array)
print('\nTotal Salary : ',num)
print('\nValues of Salary : ',subsetsum(array,num))

Output: 输出:

Values :  [ 5000 15000  3000     0  2000]

Total Salary :  25000

Values of Salary :  [5000, 15000, 3000, 0, 2000]

Now I need a way to link the values of salary present in the array to the column names present in data frame. 现在,我需要一种将数组中存在的薪金值链接到数据框中存在的列名的方法。

So my output that I would like would be: 所以我想要的输出是:

Output Required: 需要的输出:

Values :  [ 5000 15000  3000     0  2000]

Total Salary :  25000

Values of Salary :  A - 5000 B - 15000 C - 3000 E - 2000

I would suggest rewriting your subsetsum function to return the indices of the chosen elements, rather than the elements themselves (or perhaps it could return both, if that works out to be better for you). 我建议重写您的subsetsum函数,以返回所选元素的索引 ,而不是元素本身(或者如果可能对您更好,则它可能返回两个索引 )。 For example, 例如,

subsetsum([5000, 15000, 3000, 0, 2000], 25000)

would return [0, 1, 2, 3, 4] , or possibly [0, 1, 2, 4] . 将返回[0, 1, 2, 3, 4]或可能返回[0, 1, 2, 4] Then you can use these indices to access the corresponding column labels as well as the elements. 然后,您可以使用这些索引来访问相应的列标签以及元素。

With all your provided info, I check it on my own machine. 有了您提供的所有信息,我将在自己的计算机上进行检查。 The easiest way to convert a data.frame to a numpy array: 将data.frame转换为numpy数组的最简单方法:

test = val.values
array = test[0]

You can always have access to column names 您始终可以访问列名

col = val.columns.values

Finally, match the names with values 最后,将名称与值匹配

link = list(zip(col, subsetsum(array,num)))
print(link)

# Output
[('A', 5000), ('B', 15000), ('C', 3000), ('D', 0), ('E', 2000)]

The zip() will match 2 arrays with the same length, and return a zip object . zip()将匹配两个具有相同长度的数组,并返回一个zip object Then if you want to iterate and using print, first convert to list() . 然后,如果要迭代并使用print,请首先转换为list() I hope this help! 希望对您有所帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM