[英]How to use Pandas to create Dictionary from column entries in DataFrame or np.array
So I have a DataFrame
, I labeled the columns a - i. 所以我有一个
DataFrame
,我将列标记为a-i。 I want to make a Dictionary of Dictionaries
where the outer key is column "a", the inner key is column "d", and the value is "e". 我想制作一个
Dictionary of Dictionaries
,其中外键是列“ a”,内键是列“ d”,值是“ e”。 I know how to do this by iterating through each row, but I feel like there is a more efficient way to do this using DataFrame.to_dict()
but I can't figure out how...maybe DataFrame.group_by
could help but that seems to be used for grouping column or index IDs. 我知道如何通过遍历每一行来做到这一点,但是我觉得有一种使用
DataFrame.to_dict()
的更有效的方法,但是我不知道怎么做……也许DataFrame.group_by
可以帮上忙,但这似乎用于对列或索引ID进行分组。
How can I use pandas
(or numpy
) to create a Dictionary of Dictionaries
efficiently without iterating through each row? 如何使用
pandas
(或numpy
)高效地创建Dictionary of Dictionaries
而无需遍历每一行? I've shown an example of my current method and what the desired output should be below. 我已经显示了当前方法的示例以及所需的输出如下。
#!/usr/bin/python
import numpy as np
import pandas as pd
tmp_array = np.array([['AAA', 86880690, 86914111, '22RV1', 2, 2, 'H', '-'], ['ABA', 86880690, 86914111, 'A549', 2, 2, 'L', '-'], ['AAC', 86880690, 86914111, 'BFTC-905', 3, 3, 'H', '-'], ['AAB', 86880690, 86914111, 'BT-20', 2, 2, 'H', '-'], ['AAA', 86880690, 86914111, 'C32', 2, 2, 'H', '-']])
DF = pd.DataFrame(tmp_array,columns=["a,b,c,d,e,g,h,i".split(",")])
#print(DF)
a b c d e g h i
0 AAA 86880690 86914111 22RV1 2 2 H -
1 ABA 86880690 86914111 A549 2 2 L -
2 AAC 86880690 86914111 BFTC-905 3 3 H -
3 AAB 86880690 86914111 BT-20 2 2 H -
4 AAA 86880690 86914111 C32 2 2 H -
from collections import defaultdict
from itertools import izip
D_a_d_e = defaultdict(dict)
for a,d,e in izip(DF["a"],DF["d"],DF["e"]):
D_a_d_e[a][d] = e
#print(D_a_d_e)
#ignore the defaultdict part
defaultdict(<type 'dict'>, {'ABA': {'A549': '2'}, 'AAA': {'22RV1': '2', 'C32': '2'}, 'AAC': {'BFTC-905': '3'}, 'AAB': {'BT-20': '2'}})
I saw this https://stackoverflow.com/questions/28820254/how-to-create-a-pandas-dataframe-using-a-dictionary-in-a-single-column but it was a little different and it also doesn't have an answer. 我看到了这个https://stackoverflow.com/questions/28820254/how-to-create-a-pandas-dataframe-using-a-dictionary-in-a-single-column,但这有点不同,而且它也没有没有答案。
There's a to_dict
method: 有一个
to_dict
方法:
In [11]: DF.to_dict()
Out[11]:
{'a': {0: 'AAA', 1: 'ABA', 2: 'AAC', 3: 'AAB', 4: 'AAA'},
'b': {0: '86880690', 1: '86880690', 2: '86880690' 3: '86880690', 4: '86880690'},
'c': {0: '86914111', 1: '86914111', 2: '86914111', 3: '86914111', 4: '86914111'},
'd': {0: '22RV1', 1: 'A549', 2: 'BFTC-905', 3: 'BT-20', 4: 'C32'},
'e': {0: '2', 1: '2', 2: '3', 3: '2', 4: '2'},
'g': {0: '2', 1: '2', 2: '3', 3: '2', 4: '2'},
'h': {0: 'H', 1: 'L', 2: 'H', 3: 'H', 4: 'H'},
'i': {0: '-', 1: '-', 2: '-', 3: '-', 4: '-'}}
In [12]: DF.to_dict(orient="index")
Out[12]:
{0: {'a': 'AAA', 'b': '86880690', 'c': '86914111', 'd': '22RV1', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'},
1: {'a': 'ABA', 'b': '86880690', 'c': '86914111', 'd': 'A549', 'e': '2', 'g': '2', 'h': 'L', 'i': '-'},
2: {'a': 'AAC', 'b': '86880690', 'c': '86914111', 'd': 'BFTC-905', 'e': '3', 'g': '3', 'h': 'H', 'i': '-'},
3: {'a': 'AAB', 'b': '86880690', 'c': '86914111', 'd': 'BT-20', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'},
4: {'a': 'AAA', 'b': '86880690', 'c': '86914111', 'd': 'C32', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'}}
With that in mind you can do the groupby: 考虑到这一点,您可以进行分组:
In [21]: DF.set_index("d").groupby("a")[["e"]].apply(lambda x: x["e"].to_dict())
Out[21]:
a
AAA {'C32': '2', '22RV1': '2'}
AAB {'BT-20': '2'}
AAC {'BFTC-905': '3'}
ABA {'A549': '2'}
dtype: object
That said, you may be able to use a straight up MultiIndex instead of a dictionary of dictionaries: 也就是说,您可以使用直接的MultiIndex而不是字典的字典:
In [31]: res = DF.set_index(["a", "d"])["e"]
In [32]: res
Out[32]:
a d
AAA 22RV1 2
ABA A549 2
AAC BFTC-905 3
AAB BT-20 2
AAA C32 2
Name: e, dtype: object
It'll work much the same way: 它将以相同的方式工作:
In [33]: res["AAA"]
Out[33]:
d
22RV1 2
C32 2
Name: e, dtype: object
In [34]: res["AAA"]["22RV1"]
Out[34]: '2'
But will be a more space-efficient / you're still in pandas. 但是会节省空间/您仍然处于熊猫状态。
Something along these lines: 遵循以下原则:
def dictmaker(df):
"""
wrapper for storing key, values in dict. Takes df.
"""
dct={} ## storage
dct[df.d.values[0]]=df.e.values[0]
return dct
DF[['a','d','e']].groupby('a').apply(dictmaker)
a
AAA {u'22RV1': u'2'}
AAB {u'BT-20': u'2'}
AAC {u'BFTC-905': u'3'}
ABA {u'A549': u'2'}
dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.