[英]Nested dictionary with key: list[key:value] pairs to dataframe
我目前正在努力创建基于像{key1:[{key:value},{key:value}, ...],key2:[{key:value},{key:value},...]}
我希望它进入一个数据框,其中key1 and key2
的值是索引,而list
嵌套的key:value
对将成为column and record
值。
现在,对于每个key1, key2, etc
,列表键:值对的大小可以不同。 示例数据:
some_dict = {'0000297386FB11E2A2730050568F1BAB': [{'FILE_ID': '0000297386FB11E2A2730050568F1BAB'},
{'FileTime': '1362642335'},
{'Size': '1016439'},
{'DocType_Code': 'AF3BD580734A77068DD083389AD7FDAF'},
{'Filenr': 'F682B798EC9481FF031C4C12865AEB9A'},
{'DateRegistered': 'FAC4F7F9C3217645C518D5AE473DCB1E'},
{'TITLE': '2096158F036B0F8ACF6F766A9B61A58B'}],
'000031EA51DA11E397D30050568F1BAB': [{'FILE_ID': '000031EA51DA11E397D30050568F1BAB'},
{'FileTime': '1384948248'},
{'Size': '873514'},
{'DatePosted': '7C6BCB90AC45DA1ED6D1C376FC300E7B'},
{'DocType_Code': '28F404E9F3C394518AF2FD6A043D3A81'},
{'Filenr': '13A6A062672A88DE75C4D35917F3C415'},
{'DateRegistered': '8DD4262899F20DE45F09F22B3107B026'},
{'Comment': 'AE207D73C9DDB76E1EEAA9241VJGN02'},
{'TITLE': 'DF96336A6FE08E34C5A94F6A828B4B62'}]}
最终结果应如下所示:
Index | File_ID | ... | DatePosted | ... | Comment | Title
0000297386FB11E2A2730050568F1BAB|0000297386FB11E2A2730050568F1BAB|...|NaN|...|NaN|2096158F036B0F8ACF6F766A9B61A58B
000031EA51DA11E397D30050568F1BAB|000031EA51DA11E397D30050568F1BAB|...|7C6BCB90AC45DA1ED6D1C376FC300E7B|...|AE207D73C9DDB76E1EEAA9241VJGN02|DF96336A6FE08E34C5A94F6A828B4B62
现在,我尝试按照从条目具有不同长度的字典创建数据帧中的建议使用理解将字典直接解析为熊猫,并尝试进一步展平字典,然后将其解析为熊猫Flatten nested dictionaries, compressing keys 。 两者都无济于事。
干得好。
你不需要第一个字典的键。 因为它也可以在较低阶段使用。 然后你需要将多个字典合并成一个。 我用更新做到了。 然后我们把dict变成pd系列。 并将其连接到一个数据框中。
In [39]: seriess = []
...: for values in some_dict.values():
...: d = {}
...: for thing in values:
...: d.update(thing)
...: s = pd.Series(d)
...: seriess.append(s)
...:
In [40]: pd.concat(seriess,axis=1).T
Out[40]:
FILE_ID FileTime Size ... TITLE DatePosted Comment
0 0000297386FB11E2A2730050568F1BAB 1362642335 1016439 ... 2096158F036B0F8ACF6F766A9B61A58B NaN NaN
1 000031EA51DA11E397D30050568F1BAB 1384948248 873514 ... DF96336A6FE08E34C5A94F6A828B4B62 7C6BCB90AC45DA1ED6D1C376FC300E7B AE207D73C9DDB76E1EEAA9241VJGN02
让我们试试下面的代码:
dfs = []
for k in some_dict.keys():
dfs.append(pd.DataFrame.from_records(some_dict[k]))
new_df = [dfs[0].append(x) for x in dfs[1:]][0]
final_result = (new_df
.groupby(new_df['FILE_ID'].notna().cumsum())
.first())
输出
FILE_ID FileTime Size DocType_Code Filenr DateRegistered TITLE DatePosted Comment
FILE_ID
1 0000297386FB11E2A2730050568F1BAB 1362642335 1016439 AF3BD580734A77068DD083389AD7FDAF F682B798EC9481FF031C4C12865AEB9A FAC4F7F9C3217645C518D5AE473DCB1E 2096158F036B0F8ACF6F766A9B61A58B None None
2 000031EA51DA11E397D30050568F1BAB 1384948248 873514 28F404E9F3C394518AF2FD6A043D3A81 13A6A062672A88DE75C4D35917F3C415 8DD4262899F20DE45F09F22B3107B026 DF96336A6FE08E34C5A94F6A828B4B62 7C6BCB90AC45DA1ED6D1C376FC300E7B AE207D73C9DDB76E1EEAA9241VJGN02
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.