[英]pandas - create key value pair from grouped by data frame
I have a data frame with three columns, I would like to create a dictionary after applying groupby function on first and second column.I can do this by for loops, but is there any pandas way of doing it? 我有一个包含三列的数据框,我想在第一列和第二列上应用groupby函数后创建一个字典,我可以通过for循环来做到这一点,但是有什么熊猫方法吗?
DataFrame: 数据帧:
Col X Col Y Sum
A a 3
A b 2
A c 1
B p 5
B q 6
B r 7
After grouping by on Col X and Col Y : df.groupby(['Col X','Col Y']).sum() 在对Col X和Col Y进行分组之后:df.groupby(['Col X','Col Y'])。sum()
Sum
Col X Col Y
A a 3
b 2
c 1
B p 5
q 6
r 7
Dictionary I want to create 我要创建的字典
{A:{'a':3,'b':2,'c':1}, B:{'p':5,'q':6,'r':7}}
Use a dictionary comprehension while iterating via a groupby
object 通过
groupby
对象进行迭代时使用字典理解
{name: dict(zip(g['Col Y'], g['Sum'])) for name, g in df.groupby('Col X')}
{'A': {'a': 3, 'b': 2, 'c': 1}, 'B': {'p': 5, 'q': 6, 'r': 7}}
If you insisted on using to_dict
somewhere, you could do something like this: 如果您坚持在某处使用
to_dict
,则可以执行以下操作:
s = df.set_index(['Col X', 'Col Y']).Sum
{k: s.xs(k).to_dict() for k in s.index.levels[0]}
{'A': {'a': 3, 'b': 2, 'c': 1}, 'B': {'p': 5, 'q': 6, 'r': 7}}
Keep in mind, that the to_dict
method is just using some comprehension under the hood. 请记住,
to_dict
方法只是在to_dict
使用某种理解。 If you have a special use case that requires something more than what the orient
options provide for... there is no shame in constructing your own comprehension. 如果您有一个特殊的用例,而所需的东西超出了
orient
选项提供的东西,那么……构建自己的理解力就不会感到羞耻。
You can iterate over the MultiIndex
series: 您可以遍历
MultiIndex
系列:
>>> s = df.set_index(['ColX', 'ColY'])['Sum']
>>> {k: v.reset_index(level=0, drop=True).to_dict() for k, v in s.groupby(level=0)}
{'A': {'a': 3, 'b': 2, 'c': 1}, 'B': {'p': 5, 'q': 6, 'r': 7}}
#A to_dict() solution
d = df.groupby(['Col X','Col Y']).sum().reset_index().pivot(columns='Col X',values='Sum').to_dict()
Out[70]:
{'A': {0: 3.0, 1: 2.0, 2: 1.0, 3: nan, 4: nan, 5: nan},
'B': {0: nan, 1: nan, 2: nan, 3: 5.0, 4: 6.0, 5: 7.0}}
#if you need to get rid of the nans:
{k1:{k2:v2 for k2,v2 in v1.items() if pd.notnull(v2)} for k1,v1 in d.items()}
Out[73]: {'A': {0: 3.0, 1: 2.0, 2: 1.0}, 'B': {3: 5.0, 4: 6.0, 5: 7.0}}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.