繁体   English   中英

带键的嵌套字典:list[key:value] 对数据框

[英]Nested dictionary with key: list[key:value] pairs to dataframe

我目前正在努力创建基于像{key1:[{key:value},{key:value}, ...],key2:[{key:value},{key:value},...]}我希望它进入一个数据框,其中key1 and key2的值是索引,而list嵌套的key:value对将成为column and record值。

现在,对于每个key1, key2, etc ,列表键:值对的大小可以不同。 示例数据:

some_dict = {'0000297386FB11E2A2730050568F1BAB': [{'FILE_ID': '0000297386FB11E2A2730050568F1BAB'},
  {'FileTime': '1362642335'},
  {'Size': '1016439'},
  {'DocType_Code': 'AF3BD580734A77068DD083389AD7FDAF'},
  {'Filenr': 'F682B798EC9481FF031C4C12865AEB9A'},
  {'DateRegistered': 'FAC4F7F9C3217645C518D5AE473DCB1E'},
  {'TITLE': '2096158F036B0F8ACF6F766A9B61A58B'}],
 '000031EA51DA11E397D30050568F1BAB': [{'FILE_ID': '000031EA51DA11E397D30050568F1BAB'},
  {'FileTime': '1384948248'},
  {'Size': '873514'},
  {'DatePosted': '7C6BCB90AC45DA1ED6D1C376FC300E7B'},
  {'DocType_Code': '28F404E9F3C394518AF2FD6A043D3A81'},
  {'Filenr': '13A6A062672A88DE75C4D35917F3C415'},
  {'DateRegistered': '8DD4262899F20DE45F09F22B3107B026'},
  {'Comment': 'AE207D73C9DDB76E1EEAA9241VJGN02'},
  {'TITLE': 'DF96336A6FE08E34C5A94F6A828B4B62'}]}

最终结果应如下所示:

Index | File_ID | ... | DatePosted | ... | Comment | Title
0000297386FB11E2A2730050568F1BAB|0000297386FB11E2A2730050568F1BAB|...|NaN|...|NaN|2096158F036B0F8ACF6F766A9B61A58B
000031EA51DA11E397D30050568F1BAB|000031EA51DA11E397D30050568F1BAB|...|7C6BCB90AC45DA1ED6D1C376FC300E7B|...|AE207D73C9DDB76E1EEAA9241VJGN02|DF96336A6FE08E34C5A94F6A828B4B62

现在,我尝试按照从条目具有不同长度的字典创建数据帧中的建议使用理解将字典直接解析为熊猫,并尝试进一步展平字典,然后将其解析为熊猫Flatten nested dictionaries, compressing keys 两者都无济于事。

干得好。

你不需要第一个字典的键。 因为它也可以在较低阶段使用。 然后你需要将多个字典合并成一个。 我用更新做到了。 然后我们把dict变成pd系列。 并将其连接到一个数据框中。

In [39]: seriess = []
    ...: for values in some_dict.values():
    ...:     d = {}
    ...:     for thing in values:
    ...:         d.update(thing)
    ...:     s = pd.Series(d)
    ...:     seriess.append(s)
    ...:

In [40]: pd.concat(seriess,axis=1).T
Out[40]:
                            FILE_ID    FileTime     Size  ...                             TITLE                        DatePosted                          Comment
0  0000297386FB11E2A2730050568F1BAB  1362642335  1016439  ...  2096158F036B0F8ACF6F766A9B61A58B                               NaN                              NaN
1  000031EA51DA11E397D30050568F1BAB  1384948248   873514  ...  DF96336A6FE08E34C5A94F6A828B4B62  7C6BCB90AC45DA1ED6D1C376FC300E7B  AE207D73C9DDB76E1EEAA9241VJGN02

让我们试试下面的代码:

dfs = []
for k in some_dict.keys():
    dfs.append(pd.DataFrame.from_records(some_dict[k]))
    
new_df = [dfs[0].append(x) for x in dfs[1:]][0]

final_result = (new_df
                .groupby(new_df['FILE_ID'].notna().cumsum())
                .first())

输出

    FILE_ID FileTime    Size    DocType_Code    Filenr  DateRegistered  TITLE   DatePosted  Comment
FILE_ID                                 
1   0000297386FB11E2A2730050568F1BAB    1362642335  1016439 AF3BD580734A77068DD083389AD7FDAF    F682B798EC9481FF031C4C12865AEB9A    FAC4F7F9C3217645C518D5AE473DCB1E    2096158F036B0F8ACF6F766A9B61A58B    None    None
2   000031EA51DA11E397D30050568F1BAB    1384948248  873514  28F404E9F3C394518AF2FD6A043D3A81    13A6A062672A88DE75C4D35917F3C415    8DD4262899F20DE45F09F22B3107B026    DF96336A6FE08E34C5A94F6A828B4B62    7C6BCB90AC45DA1ED6D1C376FC300E7B    AE207D73C9DDB76E1EEAA9241VJGN02

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM