简体   繁体   English

熊猫数据框到dict的dict

[英]Pandas dataframe to dict of dict

Given the following pandas data frame: 鉴于以下pandas数据框:

  ColA ColB  ColC
0   a1    t     1
1   a2    t     2
2   a3    d     3
3   a4    d     4

I want to get a dictionary of dictionary. 我想要一本字典词典。

But I managed to create the following only: 但我设法创建了以下内容:

d = {t : [1, 2], d : [3, 4]}

by: 通过:

d = {k: list(v) for k,v in duplicated.groupby("ColB")["ColC"]}

How could I obtain the dict of dict: 我怎么能得到字典的字典:

dd = {t : {a1:1, a2:2}, d : {a3:3, a4:4}}

You can do this with a groupby + apply step beforehand. 您可以事先使用groupby + apply步骤执行此操作。

dd = df.set_index('ColA').groupby('ColB').apply(
    lambda x: x.ColC.to_dict()
).to_dict()

Or, with a dict comprehension: 或者,通过词典理解:

dd = {k : g.ColC.to_dict() for k, g in df.set_index('ColA').groupby('ColB')}

print(dd)
{'d': {'a3': 3, 'a4': 4}, 't': {'a1': 1, 'a2': 2}}

The point of this answer is to show that there is a straight forward way to do this with simple iteration and tools from the standard library. 这个答案的要点是表明通过简单的迭代和标准库中的工具可以直接实现这一点。

Often times we perform many transformations on a Pandas DataFrame where each transformation invokes the construction of a new Pandas object. 通常我们会在Pandas DataFrame上执行许多转换,其中每个转换都会调用新Pandas对象的构造。 At times this can be an intuitive progression and make perfect sense. 有时,这可以是一个直观的进展,并具有完美的意义。 However, there are times when we forget that we can use simpler tools. 但是,有时我们忘记了我们可以使用更简单的工具。 I believe this is one of those times. 我相信这是其中一次。 My answer still uses Pandas in that I use the itertuples method. 我的回答仍然使用Pandas,因为我使用了itertuples方法。

from collections import defaultdict

d = defaultdict(dict)

for a, b, c in df.itertuples(index=False):
    d[b][a] = c

d = dict(d)

d

{'t': {'a1': 1, 'a2': 2}, 'd': {'a3': 3, 'a4': 4}}

Slight alternative. 轻微的替代品。 Since the tuples we are iterating over are named tuples, we can access each element by the name of the column it represents. 由于我们迭代的元组是名为元组,我们可以通过它所代表的列的名称访问每个元素。

from collections import defaultdict

d = defaultdict(dict)

for t in df.itertuples():
    d[t.ColB][t.ColA] = t.ColC

d = dict(d)

d

{'t': {'a1': 1, 'a2': 2}, 'd': {'a3': 3, 'a4': 4}}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM