如何使用熊猫从DataFrame或np.array中的列条目创建字典

Question

So I have a DataFrame , I labeled the columns a - i. 所以我有一个DataFrame ，我将列标记为a-i。 I want to make a Dictionary of Dictionaries where the outer key is column "a", the inner key is column "d", and the value is "e". 我想制作一个Dictionary of Dictionaries ，其中外键是列“ a”，内键是列“ d”，值是“ e”。 I know how to do this by iterating through each row, but I feel like there is a more efficient way to do this using DataFrame.to_dict() but I can't figure out how...maybe DataFrame.group_by could help but that seems to be used for grouping column or index IDs. 我知道如何通过遍历每一行来做到这一点，但是我觉得有一种使用DataFrame.to_dict()的更有效的方法，但是我不知道怎么做……也许DataFrame.group_by可以帮上忙，但这似乎用于对列或索引ID进行分组。

How can I use pandas (or numpy ) to create a Dictionary of Dictionaries efficiently without iterating through each row? 如何使用pandas （或numpy ）高效地创建Dictionary of Dictionaries而无需遍历每一行？ I've shown an example of my current method and what the desired output should be below. 我已经显示了当前方法的示例以及所需的输出如下。

#!/usr/bin/python
import numpy as np
import pandas as pd

tmp_array = np.array([['AAA', 86880690, 86914111, '22RV1', 2, 2, 'H', '-'], ['ABA', 86880690, 86914111, 'A549', 2, 2, 'L', '-'], ['AAC', 86880690, 86914111, 'BFTC-905', 3, 3, 'H', '-'], ['AAB', 86880690, 86914111, 'BT-20', 2, 2, 'H', '-'], ['AAA', 86880690, 86914111, 'C32', 2, 2, 'H', '-']])

DF = pd.DataFrame(tmp_array,columns=["a,b,c,d,e,g,h,i".split(",")])

#print(DF)
a         b         c         d  e  g  h  i
0  AAA  86880690  86914111     22RV1  2  2  H  -
1  ABA  86880690  86914111      A549  2  2  L  -
2  AAC  86880690  86914111  BFTC-905  3  3  H  -
3  AAB  86880690  86914111     BT-20  2  2  H  -
4  AAA  86880690  86914111       C32  2  2  H  -

from collections import defaultdict
from itertools import izip

D_a_d_e = defaultdict(dict)
for a,d,e in izip(DF["a"],DF["d"],DF["e"]):
    D_a_d_e[a][d] = e

#print(D_a_d_e)
#ignore the defaultdict part

defaultdict(<type 'dict'>, {'ABA': {'A549': '2'}, 'AAA': {'22RV1': '2', 'C32': '2'}, 'AAC': {'BFTC-905': '3'}, 'AAB': {'BT-20': '2'}})

I saw this https://stackoverflow.com/questions/28820254/how-to-create-a-pandas-dataframe-using-a-dictionary-in-a-single-column but it was a little different and it also doesn't have an answer. 我看到了这个https://stackoverflow.com/questions/28820254/how-to-create-a-pandas-dataframe-using-a-dictionary-in-a-single-column，但这有点不同，而且它也没有没有答案。

Answer 1

There's a to_dict method: 有一个to_dict方法：

In [11]: DF.to_dict()
Out[11]:
{'a': {0: 'AAA', 1: 'ABA', 2: 'AAC', 3: 'AAB', 4: 'AAA'},
 'b': {0: '86880690', 1: '86880690', 2: '86880690' 3: '86880690', 4: '86880690'},
 'c': {0: '86914111', 1: '86914111', 2: '86914111', 3: '86914111', 4: '86914111'},
 'd': {0: '22RV1', 1: 'A549', 2: 'BFTC-905', 3: 'BT-20', 4: 'C32'},
 'e': {0: '2', 1: '2', 2: '3', 3: '2', 4: '2'},
 'g': {0: '2', 1: '2', 2: '3', 3: '2', 4: '2'},
 'h': {0: 'H', 1: 'L', 2: 'H', 3: 'H', 4: 'H'},
 'i': {0: '-', 1: '-', 2: '-', 3: '-', 4: '-'}}

In [12]: DF.to_dict(orient="index")
Out[12]:
{0: {'a': 'AAA', 'b': '86880690', 'c': '86914111', 'd': '22RV1', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'},
 1: {'a': 'ABA', 'b': '86880690', 'c': '86914111', 'd': 'A549', 'e': '2', 'g': '2', 'h': 'L', 'i': '-'},
 2: {'a': 'AAC', 'b': '86880690', 'c': '86914111', 'd': 'BFTC-905', 'e': '3', 'g': '3', 'h': 'H', 'i': '-'},
 3: {'a': 'AAB', 'b': '86880690', 'c': '86914111', 'd': 'BT-20', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'},
 4: {'a': 'AAA', 'b': '86880690', 'c': '86914111', 'd': 'C32', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'}}

With that in mind you can do the groupby: 考虑到这一点，您可以进行分组：

In [21]: DF.set_index("d").groupby("a")[["e"]].apply(lambda x: x["e"].to_dict())
Out[21]:
a
AAA    {'C32': '2', '22RV1': '2'}
AAB                {'BT-20': '2'}
AAC             {'BFTC-905': '3'}
ABA                 {'A549': '2'}
dtype: object

That said, you may be able to use a straight up MultiIndex instead of a dictionary of dictionaries: 也就是说，您可以使用直接的MultiIndex而不是字典的字典：

In [31]: res = DF.set_index(["a", "d"])["e"]

In [32]: res
Out[32]:
a    d
AAA  22RV1       2
ABA  A549        2
AAC  BFTC-905    3
AAB  BT-20       2
AAA  C32         2
Name: e, dtype: object

It'll work much the same way: 它将以相同的方式工作：

In [33]: res["AAA"]
Out[33]:
d
22RV1    2
C32      2
Name: e, dtype: object

In [34]: res["AAA"]["22RV1"]
Out[34]: '2'

But will be a more space-efficient / you're still in pandas. 但是会节省空间/您仍然处于熊猫状态。

Answer 2

Something along these lines: 遵循以下原则：

def dictmaker(df): 
    """
    wrapper for storing key, values in dict. Takes df.
    """
    dct={}  ## storage
    dct[df.d.values[0]]=df.e.values[0]
    return dct

DF[['a','d','e']].groupby('a').apply(dictmaker)

a
AAA       {u'22RV1': u'2'}
AAB       {u'BT-20': u'2'}
AAC    {u'BFTC-905': u'3'}
ABA        {u'A549': u'2'}
dtype: object

如何使用熊猫从DataFrame或np.array中的列条目创建字典

问题描述

2 个解决方案

解决方案1
4 已采纳 2015-11-12 23:27:52

解决方案2
0 2015-11-12 23:14:36

如何使用熊猫从DataFrame或np.array中的列条目创建字典

问题描述

2 个解决方案

解决方案1 4 已采纳 2015-11-12 23:27:52

解决方案2 0 2015-11-12 23:14:36

解决方案1
4 已采纳 2015-11-12 23:27:52

解决方案2
0 2015-11-12 23:14:36