简体   繁体   English

如何按 Python 中的多键字典一键分组?

[英]How to group by a multikey dict in Python by one key?

I have a multikey dict here.我这里有一个多键字典。 I am trying to group by the dict by a the first key (A, B) and convert it to a transposed dataframe and write it to a csv file.我正在尝试按第一个键(A,B)按字典分组并将其转换为转置的 dataframe 并将其写入 csv 文件。

>>> dic= { ('A',1): 4, ('A',1):2, ('B', 1): 2, ('A', 2): 5, ('B', 2):3}
>>> dic
{('A', 1): 2, ('B', 1): 2, ('A', 2): 5, ('B', 2): 3}
>>> df = pd.DataFrame(dic.items()).groupby(0).sum()
>>> df
        1
0
(A, 1)  2
(A, 2)  5
(B, 1)  2
(B, 2)  3

here is what I have been doing so far:这是我到目前为止一直在做的事情:

>>> df = pd.DataFrame(dic.items()).groupby(0).sum()
>>> df
        1
0
(A, 1)  4
(A, 2)  5
(B, 1)  2
(B, 2)  3

>>> df_t = df.T
0  (A, 1)  (A, 2)  (B, 1)  (B, 2)
1       4       5       2       3
>>> df_t.to_csv(./file.csv)

What I am looking to get is something like this:我想要得到的是这样的:

    1     2
A   6     5
B   2     3  

First of all, a dictionary never contains duplicated keys (ie A dictionary can hold 1 key to N values, but not N keys to 1 value).首先,字典从不包含重复的键(即字典可以保存 1 个键对应 N 个值,但不能保存 N 个键对应 1 个值)。 In current scenario your dic contain duplicate keys so while executing it will take most recent value only.在当前情况下,您的dic包含重复的键,因此在执行时它将仅采用最新值。 If your dic contain duplicate keys possible solution is to put the values inside lists.如果您的dic包含重复键,可能的解决方案是将值放在列表中。 Something like就像是

dic = { ('A',1): 4, ('A',1):2, ('B', 1): 2, ('A', 2): 5, ('B', 2):3}

should be,应该,

dic = {('A',1):[4,2], ('B', 1): [2], ('A', 2): [5], ('B', 2):[3]}

Now the solution part,现在解决方案部分,

import pandas as pd

#data
dic = {('A',1):[4,2], ('B', 1): [2], ('A', 2): [5], ('B', 2):[3]}

#Converting dic to dataframe object
df = pd.DataFrame(dic.items())

#Explode will convert list of values to row like structure 
exp = df[1].explode().to_frame().reset_index()

#Merging df and exp to combine results
df = df.reset_index().merge(exp, on = 'index', how = 'left')

#Converting tuple of keys into separate columns
df[['i1','i2']] = df[0].apply(pd.Series)

#Summing up the result and then pivoting them to get desired result
df.groupby(['i1','i2'])['1_y'].sum().reset_index().pivot(index=['i1'],columns=['i2'],values=['1_y'])

#Renaming columns and index
res.columns = ['1','2']
res.index.names = ['']
res

Output: Output:

      1 2
    
A     6 5
B     2 3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM