如何规范化 Pandas 数据框中的字典类型列？

Question

我有一个熊猫一个包含两列的数据框。 1) 关键字 2) TopicID。

关键字是字典类型。 我想以每个主题将针对每个关键字及其值重复的方式来规范化这个数据框。

我的数据框

预期的数据框（对于示例，我只发布了几个关键字）

我试过这段代码

df_final = pd.json_normalize(df.keywords.apply(json.loads))

输出 -> 打印 (df[['TopicID','keywords']].head(2).to_dict())

{'TopicID': {0: 797, 1: 798}, 'keywords': {0: {'licence': 0.529, 'chapter': 0.462, 'explains': 0.263, 'visitor': 0.244, 'resident': 0.22, 'applying': 0.205, 'privileges': 0.199, 'graduated': 0.188, 'tests': 0.184, 'licensing': 0.18}, 1: {'emotional': 0.352, 'mental': 0.327, 'state': 0.309, 'operate': 0.295, 'drive': 0.242, 'motor': 0.227, 'ability': 0.227, 'next': 0.176, 'illness': 0.176, 'diminish': 0.176}}}

Answer 1

首先通过在列表理解中展平字典来创建元组列表，然后传递给DataFrame构造函数：

L = [(a, k, v) for a, b in zip(df['TopicID'], df['keywords']) for k, v in b.items()]
df_final = pd.DataFrame(L, columns=['TopicID','Keyword','Value'])
print (df_final)
    TopicID     Keyword  Value
0       797     licence  0.529
1       797     chapter  0.462
2       797    explains  0.263
3       797     visitor  0.244
4       797    resident  0.220
5       797    applying  0.205
6       797  privileges  0.199
7       797   graduated  0.188
8       797       tests  0.184
9       797   licensing  0.180
10      798   emotional  0.352
11      798      mental  0.327
12      798       state  0.309
13      798     operate  0.295
14      798       drive  0.242
15      798       motor  0.227
16      798     ability  0.227
17      798        next  0.176
18      798     illness  0.176
19      798    diminish  0.176

如何规范化 Pandas 数据框中的字典类型列？

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-06-27 09:39:53

如何规范化 Pandas 数据框中的字典类型列？

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-06-27 09:39:53

解决方案1
1 已采纳 2022-06-27 09:39:53