繁体   English   中英

将嵌套字典(json 文件)转换为数据框

[英]Converting a nested dictionary(json file) into a dataframe

我有以下 json 文件 -

{
    "quiz": {
       "sport": { "q1": {
                "question": "Which one is correct team name in NBA?",
                "options": [
                    "New York Bulls",
                    "Los Angeles Kings",
                    "Golden State Warriros",
                    "Huston Rocket"
                ],
                "answer": "Huston Rocket"
            }
        },
        "maths": {
            "q1": {
                "question": "5 + 7 = ?",
                "options": [
                    "10",
                    "11",
                    "12",
                    "13"
                ],
                "answer": "12",
                "test_dict":{"a":1,"b":2,"dddd":{"1":1,"2":2}}
            },
            "q2": {
                "question": "12 - 8 = ?",
                "options": [
                    "1",
                    "2",
                    "3",
                    "4"
                ],
                "answer": "4"
            }
        }
    },
    "summary": "good example",
    "viewer rating": 6
}

我想将其转换为 DataFrame。 像这样的东西——

quiz   q1   q2   question     options  answer   test_dict  summary       viewer rating
sport  q1   NaN  Which one..  [list]   Huston.. NaN        good example  6
maths  q1   NaN  5 + 7 = ?    [list]   12       {"a":1..   good example  6
maths  NaN  q2   12 - 8 = ?   [list]   4        NaN        good example  6

我尝试使用

file1 = open("json2.json")
data = json.load(file1)
df = pd.json_normalize(data, record_path=['quiz'])

但我收到以下错误 -

TypeError: {'quiz': {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 + 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12', 'test_dict': {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}}, 'summary': 'good example', 'viewer rating': 6} has non list value {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 + 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12', 'test_dict': {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}} for path quiz. Must be list or null.

问题是,它不是一个列表,而是一个字典本身。 所以,我也尝试这样做 -

pd.json_normalize(data, max_level=2)

但是,我没有得到预期的输出。 我只得到一排。 有人可以给我一些指点吗?

您可以使用列表理解:

import pandas as pd
d = {'quiz': {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 + 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12', 'test_dict': {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}}, 'summary': 'good example', 'viewer rating': 6}
r = [{'quiz':a, q:q, **v, 'summary':d['summary'], 'viewer rating':d['viewer rating']}
          for a, b in d['quiz'].items() for q, v in b.items()]

df = pd.DataFrame(r)

输出:

    quiz   q1                                question  ... viewer rating                                   test_dict   q2
0  sport   q1  Which one is correct team name in NBA?  ...             6                                         NaN  NaN
1  maths   q1                               5 + 7 = ?  ...             6  {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}  NaN
2  maths  NaN                              12 - 8 = ?  ...             6                                         NaN   q2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM