简体   繁体   English

如何使用熊猫从DataFrame或np.array中的列条目创建字典

[英]How to use Pandas to create Dictionary from column entries in DataFrame or np.array

So I have a DataFrame , I labeled the columns a - i. 所以我有一个DataFrame ,我将列标记为a-i。 I want to make a Dictionary of Dictionaries where the outer key is column "a", the inner key is column "d", and the value is "e". 我想制作一个Dictionary of Dictionaries ,其中外键是列“ a”,内键是列“ d”,值是“ e”。 I know how to do this by iterating through each row, but I feel like there is a more efficient way to do this using DataFrame.to_dict() but I can't figure out how...maybe DataFrame.group_by could help but that seems to be used for grouping column or index IDs. 我知道如何通过遍历每一行来做到这一点,但是我觉得有一种使用DataFrame.to_dict()的更有效的方法,但是我不知道怎么做……也许DataFrame.group_by可以帮上忙,但这似乎用于对列或索引ID进行分组。

How can I use pandas (or numpy ) to create a Dictionary of Dictionaries efficiently without iterating through each row? 如何使用pandas (或numpy )高效地创建Dictionary of Dictionaries而无需遍历每一行? I've shown an example of my current method and what the desired output should be below. 我已经显示了当前方法的示例以及所需的输出如下。

#!/usr/bin/python
import numpy as np
import pandas as pd

tmp_array = np.array([['AAA', 86880690, 86914111, '22RV1', 2, 2, 'H', '-'], ['ABA', 86880690, 86914111, 'A549', 2, 2, 'L', '-'], ['AAC', 86880690, 86914111, 'BFTC-905', 3, 3, 'H', '-'], ['AAB', 86880690, 86914111, 'BT-20', 2, 2, 'H', '-'], ['AAA', 86880690, 86914111, 'C32', 2, 2, 'H', '-']])

DF = pd.DataFrame(tmp_array,columns=["a,b,c,d,e,g,h,i".split(",")])

#print(DF)
a         b         c         d  e  g  h  i
0  AAA  86880690  86914111     22RV1  2  2  H  -
1  ABA  86880690  86914111      A549  2  2  L  -
2  AAC  86880690  86914111  BFTC-905  3  3  H  -
3  AAB  86880690  86914111     BT-20  2  2  H  -
4  AAA  86880690  86914111       C32  2  2  H  -

from collections import defaultdict
from itertools import izip

D_a_d_e = defaultdict(dict)
for a,d,e in izip(DF["a"],DF["d"],DF["e"]):
    D_a_d_e[a][d] = e

#print(D_a_d_e)
#ignore the defaultdict part

defaultdict(<type 'dict'>, {'ABA': {'A549': '2'}, 'AAA': {'22RV1': '2', 'C32': '2'}, 'AAC': {'BFTC-905': '3'}, 'AAB': {'BT-20': '2'}})

I saw this https://stackoverflow.com/questions/28820254/how-to-create-a-pandas-dataframe-using-a-dictionary-in-a-single-column but it was a little different and it also doesn't have an answer. 我看到了这个https://stackoverflow.com/questions/28820254/how-to-create-a-pandas-dataframe-using-a-dictionary-in-a-single-column,但这有点不同,而且它也没有没有答案。

There's a to_dict method: 有一个to_dict方法:

In [11]: DF.to_dict()
Out[11]:
{'a': {0: 'AAA', 1: 'ABA', 2: 'AAC', 3: 'AAB', 4: 'AAA'},
 'b': {0: '86880690', 1: '86880690', 2: '86880690' 3: '86880690', 4: '86880690'},
 'c': {0: '86914111', 1: '86914111', 2: '86914111', 3: '86914111', 4: '86914111'},
 'd': {0: '22RV1', 1: 'A549', 2: 'BFTC-905', 3: 'BT-20', 4: 'C32'},
 'e': {0: '2', 1: '2', 2: '3', 3: '2', 4: '2'},
 'g': {0: '2', 1: '2', 2: '3', 3: '2', 4: '2'},
 'h': {0: 'H', 1: 'L', 2: 'H', 3: 'H', 4: 'H'},
 'i': {0: '-', 1: '-', 2: '-', 3: '-', 4: '-'}}

In [12]: DF.to_dict(orient="index")
Out[12]:
{0: {'a': 'AAA', 'b': '86880690', 'c': '86914111', 'd': '22RV1', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'},
 1: {'a': 'ABA', 'b': '86880690', 'c': '86914111', 'd': 'A549', 'e': '2', 'g': '2', 'h': 'L', 'i': '-'},
 2: {'a': 'AAC', 'b': '86880690', 'c': '86914111', 'd': 'BFTC-905', 'e': '3', 'g': '3', 'h': 'H', 'i': '-'},
 3: {'a': 'AAB', 'b': '86880690', 'c': '86914111', 'd': 'BT-20', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'},
 4: {'a': 'AAA', 'b': '86880690', 'c': '86914111', 'd': 'C32', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'}}

With that in mind you can do the groupby: 考虑到这一点,您可以进行分组:

In [21]: DF.set_index("d").groupby("a")[["e"]].apply(lambda x: x["e"].to_dict())
Out[21]:
a
AAA    {'C32': '2', '22RV1': '2'}
AAB                {'BT-20': '2'}
AAC             {'BFTC-905': '3'}
ABA                 {'A549': '2'}
dtype: object

That said, you may be able to use a straight up MultiIndex instead of a dictionary of dictionaries: 也就是说,您可以使用直接的MultiIndex而不是字典的字典:

In [31]: res = DF.set_index(["a", "d"])["e"]

In [32]: res
Out[32]:
a    d
AAA  22RV1       2
ABA  A549        2
AAC  BFTC-905    3
AAB  BT-20       2
AAA  C32         2
Name: e, dtype: object

It'll work much the same way: 它将以相同的方式工作:

In [33]: res["AAA"]
Out[33]:
d
22RV1    2
C32      2
Name: e, dtype: object

In [34]: res["AAA"]["22RV1"]
Out[34]: '2'

But will be a more space-efficient / you're still in pandas. 但是会节省空间/您仍然处于熊猫状态。

Something along these lines: 遵循以下原则:

def dictmaker(df): 
    """
    wrapper for storing key, values in dict. Takes df.
    """
    dct={}  ## storage
    dct[df.d.values[0]]=df.e.values[0]
    return dct

DF[['a','d','e']].groupby('a').apply(dictmaker)

a
AAA       {u'22RV1': u'2'}
AAB       {u'BT-20': u'2'}
AAC    {u'BFTC-905': u'3'}
ABA        {u'A549': u'2'}
dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在pandas.DataFrame中将np.array添加为列 - Adding an np.array as a column in a pandas.DataFrame 如何在保留索引的同时将Pandas数据帧转换为np.array? - How to convert Pandas dataframe to np.array while preserving the index? 从pandas数据框创建np.array,该数据框有一列保存数组索引的值,另一列保存每个索引的值? - Create np.array from pandas dataframe which has a column holding values of the array's indices and another column holding the value at each index? 熊猫使用np.array()。T初始化数据帧 - Pandas initialize dataframe with np.array().T 如何在 Pandas 中将带有数字列表的列转换为 np.array 格式 - How to convert a column with list of numbers to np.array format in Pandas AWS DLAMI中的Pandas np.array列 - Pandas np.array column in AWS DLAMI 如何从qiskit中的np.array创建单一门? - How to create unitary gate from np.array in qiskit? 如何使用熊猫替换DataFrame中的列条目并创建字典新旧值 - How to use Pandas to replace column entries in DataFrame and create dictionary new-old values 从 np.array 在 networkx 中创建网络 - Create network in networkx from np.array 将np.array数据添加到熊猫数据框中的列后,是否可以对其排序? - Can I sort my np.array data once it has been added to a column in a pandas dataframe?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM