简体   繁体   English

将熊猫 df 转换为字典

[英]Convert pandas df to a dictionary

I have a requirement to convert a df that is in following format:我需要转换以下格式的 df:

d = {
    'A': ['a1', 'a1', 'a1', 'a1', 'a1', 'a1', 'a1', 'a2', 'a2', 'a2', 'a2', 'a2', 'a2', 'a2', 'a2'],
    'B': ['b1', 'b1', 'b1', 'b1', 'b2', 'b2', 'b2', 'b3', 'b3', 'b3', 'b3', 'b3', 'b3', 'b4', 'b4', ],
    'C': ['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7', 'c8', 'c9', 'c10', 'c11', 'c12', 'c13', 'c14', 'c15', ],
}

df = pd.DataFrame(d)
df
    A   B   C
0   a1  b1  c1
1   a1  b1  c2
2   a1  b1  c3
3   a1  b1  c4
4   a1  b2  c5
5   a1  b2  c6
6   a1  b2  c7
7   a2  b3  c8
8   a2  b3  c9
9   a2  b3  c10
10  a2  b3  c11
11  a2  b3  c12
12  a2  b3  c13
13  a2  b4  c14
14  a2  b4  c15

to a dictionary in following format:到以下格式的字典:

outDict = {
    'a1': {
        'b1': ['c1', 'c2', 'c3', 'c4'],
        'b2': ['c5', 'c6', 'c7'],
    },
    'a2': {
        'b3': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'],
        'b4': ['c14', 'c15'],
    }
}

ie values in column A becomes first level key;即 A 列中的值成为第一级键; values in column B second level keys and values in column C a list. B 列二级键中的值和 C 列中的值是一个列表。

Any pointers?任何指针?

Here is another way using pivot_table :这是使用pivot_table另一种方法:

out = {k:v.dropna().to_dict() for k,v in 
      df.pivot_table('C','B','A',aggfunc=list).items()}

{'a1': {'b1': ['c1', 'c2', 'c3', 'c4'], 'b2': ['c5', 'c6', 'c7']},
 'a2': {'b3': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'], 'b4': ['c14', 'c15']}}

This will be a little long, I dare say:这会有点长,我敢说:


new_dict = {k: v['C'] for k,v in df.groupby(['A', 'B'])
                                 .agg(list).groupby(level=0)
                                 .apply(lambda df: df.xs(df.name)
                                 .to_dict()).to_dict().items()}

print(new_dict)

Output:输出:

{
    'a1': {
        'b1': ['c1', 'c2', 'c3', 'c4'],
        'b2': ['c5', 'c6', 'c7'],
    },
    'a2': {
        'b3': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'],
        'b4': ['c14', 'c15'],
    }
}

Unpacked:开箱:

>>> df.groupby(['A', 'B']).agg(list)
                                  C
A  B                               
a1 b1              [c1, c2, c3, c4]
   b2                  [c5, c6, c7]
a2 b3  [c8, c9, c10, c11, c12, c13]
   b4                    [c14, c15]
>>> df.groupby(['A', 'B']).agg(list).groupby(level=0).apply(lambda df: df.xs(df.name).to_dict())
# we groupby level 0 again, then call xs as aggregator function to access each key
# in level 0, and convert to dict
A
a1    {'C': {'b1': ['c1', 'c2', 'c3', 'c4'], 'b2': [...
a2    {'C': {'b3': ['c8', 'c9', 'c10', 'c11', 'c12',...
dtype: object

>>> df.groupby(['A', 'B']).agg(list).groupby(level=0).apply(lambda df: df.xs(df.name).to_dict()).to_dict()

{'a1': {'C': {'b1': ['c1', 'c2', 'c3', 'c4'], 'b2': ['c5', 'c6', 'c7']}},
 'a2': {'C': {'b3': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'],
   'b4': ['c14', 'c15']}}}

# then just using dict comp to remove column name 'C'

A recursive solution seems pretty natural and works for any number of columns.递归解决方案似乎很自然并且适用于任意数量的列。 We groupby on the leftmost column and recursively convert the remaining columns to the desired format.我们对最左边的列进行groupby ,并递归地将剩余的列转换为所需的格式。 If only one column is left, a list is returned.如果只剩下一列,则返回一个列表。

def df2dict_rec(df):
  if df.shape[1] == 1:
    return df.values[:,0].tolist()
  else:
    return {k: df2dict_rec(df_k.iloc[:,1:]) for k, df_k in df.groupby(df.columns[0])}


res = df2dict_rec(df)
# {'a1': {'b1': ['c1', 'c2', 'c3', 'c4'], 'b2': ['c5', 'c6', 'c7']},
#  'a2': {'b3': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'], 'b4': ['c14', 'c15']}}

Any enhancements will be welcome欢迎任何改进

dictLevel1 = {}
dictLevel2 = {}
for b,c in zip(list(df['B']),list(df['C'])):
    try:
        dictLevel2[b].append(c)
    except KeyError:
        dictLevel2[b] = [c]
for a,b in zip(list(df['A']),list(df['B'])):
    try:
        dictLevel1[a].update({b:dictLevel2[b]})
    except:
        dictLevel1[a] = {b: dictLevel2[b]}
print(dictLevel1)

Output输出

{'a1': {'b1': ['c1', 'c2', 'c3', 'c4'], 'b2': ['c5', 'c6', 'c7']}, 'a2': {'b3': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'], 'b4': ['c14', 'c15']}}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM