[英]Get Pandas Dataframe Column names from Numpy Array
I have a dataframe imported from excel: 我有一个从excel导入的数据框:
>>df
Name Emp ID Total Salary A B C D E
0 Mike A001 25000 5000 15000 3000 0 2000
1 John A002 23000 5000 10000 3000 3000 2000
2 Bob A003 21000 5000 15000 0 1000 0
3 Rose A004 20000 5000 10000 2000 1000 20000
4 James A005 10000 5000 0 3000 0 2000
Now I have calculated the sum of subset of Total Salary using the following code: 现在,我使用以下代码计算了总工资的子集总和:
Code: 码:
import pandas as pd
import numpy as np
df = pd.read_excel('tmp/test.xlsx')
val = df.drop(['Name','Emp ID','Total Salary'],1)
test = np.array(val)
num = df['Total Salary'][0]
array = test[0]
def subsetsum(array,num):
if num == 0 or num < 1:
return None
elif len(array) == 0:
return None
else:
if np.isclose(array[0],num):
return [array[0]]
else:
with_v = subsetsum(array[1:],(num - array[0]))
if with_v:
return [array[0]] + with_v
else:
return subsetsum(array[1:],num)
print('\nValues : ',array)
print('\nTotal Salary : ',num)
print('\nValues of Salary : ',subsetsum(array,num))
Output: 输出:
Values : [ 5000 15000 3000 0 2000]
Total Salary : 25000
Values of Salary : [5000, 15000, 3000, 0, 2000]
Now I need a way to link the values of salary present in the array to the column names present in data frame. 现在,我需要一种将数组中存在的薪金值链接到数据框中存在的列名的方法。
So my output that I would like would be: 所以我想要的输出是:
Output Required: 需要的输出:
Values : [ 5000 15000 3000 0 2000]
Total Salary : 25000
Values of Salary : A - 5000 B - 15000 C - 3000 E - 2000
I would suggest rewriting your subsetsum
function to return the indices of the chosen elements, rather than the elements themselves (or perhaps it could return both, if that works out to be better for you). 我建议重写您的
subsetsum
函数,以返回所选元素的索引 ,而不是元素本身(或者如果可能对您更好,则它可能返回两个索引 )。 For example, 例如,
subsetsum([5000, 15000, 3000, 0, 2000], 25000)
would return [0, 1, 2, 3, 4]
, or possibly [0, 1, 2, 4]
. 将返回
[0, 1, 2, 3, 4]
或可能返回[0, 1, 2, 4]
。 Then you can use these indices to access the corresponding column labels as well as the elements. 然后,您可以使用这些索引来访问相应的列标签以及元素。
With all your provided info, I check it on my own machine. 有了您提供的所有信息,我将在自己的计算机上进行检查。 The easiest way to convert a data.frame to a numpy array:
将data.frame转换为numpy数组的最简单方法:
test = val.values
array = test[0]
You can always have access to column names 您始终可以访问列名
col = val.columns.values
Finally, match the names with values 最后,将名称与值匹配
link = list(zip(col, subsetsum(array,num)))
print(link)
# Output
[('A', 5000), ('B', 15000), ('C', 3000), ('D', 0), ('E', 2000)]
The zip()
will match 2 arrays with the same length, and return a zip object
. zip()
将匹配两个具有相同长度的数组,并返回一个zip object
。 Then if you want to iterate and using print, first convert to list()
. 然后,如果要迭代并使用print,请首先转换为
list()
。 I hope this help! 希望对您有所帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.