[英]Converting a nested dictionary(json file) into a dataframe
我有以下 json 文件 -
{
"quiz": {
"sport": { "q1": {
"question": "Which one is correct team name in NBA?",
"options": [
"New York Bulls",
"Los Angeles Kings",
"Golden State Warriros",
"Huston Rocket"
],
"answer": "Huston Rocket"
}
},
"maths": {
"q1": {
"question": "5 + 7 = ?",
"options": [
"10",
"11",
"12",
"13"
],
"answer": "12",
"test_dict":{"a":1,"b":2,"dddd":{"1":1,"2":2}}
},
"q2": {
"question": "12 - 8 = ?",
"options": [
"1",
"2",
"3",
"4"
],
"answer": "4"
}
}
},
"summary": "good example",
"viewer rating": 6
}
我想将其转换为 DataFrame。 像这样的东西——
quiz q1 q2 question options answer test_dict summary viewer rating
sport q1 NaN Which one.. [list] Huston.. NaN good example 6
maths q1 NaN 5 + 7 = ? [list] 12 {"a":1.. good example 6
maths NaN q2 12 - 8 = ? [list] 4 NaN good example 6
我尝试使用
file1 = open("json2.json")
data = json.load(file1)
df = pd.json_normalize(data, record_path=['quiz'])
但我收到以下错误 -
TypeError: {'quiz': {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 + 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12', 'test_dict': {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}}, 'summary': 'good example', 'viewer rating': 6} has non list value {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 + 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12', 'test_dict': {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}} for path quiz. Must be list or null.
问题是,它不是一个列表,而是一个字典本身。 所以,我也尝试这样做 -
pd.json_normalize(data, max_level=2)
但是,我没有得到预期的输出。 我只得到一排。 有人可以给我一些指点吗?
您可以使用列表理解:
import pandas as pd
d = {'quiz': {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 + 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12', 'test_dict': {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}}, 'summary': 'good example', 'viewer rating': 6}
r = [{'quiz':a, q:q, **v, 'summary':d['summary'], 'viewer rating':d['viewer rating']}
for a, b in d['quiz'].items() for q, v in b.items()]
df = pd.DataFrame(r)
输出:
quiz q1 question ... viewer rating test_dict q2
0 sport q1 Which one is correct team name in NBA? ... 6 NaN NaN
1 maths q1 5 + 7 = ? ... 6 {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}} NaN
2 maths NaN 12 - 8 = ? ... 6 NaN q2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.