简体   繁体   中英

Converting a nested dictionary(json file) into a dataframe

I have the following json file -

{
    "quiz": {
       "sport": { "q1": {
                "question": "Which one is correct team name in NBA?",
                "options": [
                    "New York Bulls",
                    "Los Angeles Kings",
                    "Golden State Warriros",
                    "Huston Rocket"
                ],
                "answer": "Huston Rocket"
            }
        },
        "maths": {
            "q1": {
                "question": "5 + 7 = ?",
                "options": [
                    "10",
                    "11",
                    "12",
                    "13"
                ],
                "answer": "12",
                "test_dict":{"a":1,"b":2,"dddd":{"1":1,"2":2}}
            },
            "q2": {
                "question": "12 - 8 = ?",
                "options": [
                    "1",
                    "2",
                    "3",
                    "4"
                ],
                "answer": "4"
            }
        }
    },
    "summary": "good example",
    "viewer rating": 6
}

I want to convert it to a DataFrame. Something like this -

quiz   q1   q2   question     options  answer   test_dict  summary       viewer rating
sport  q1   NaN  Which one..  [list]   Huston.. NaN        good example  6
maths  q1   NaN  5 + 7 = ?    [list]   12       {"a":1..   good example  6
maths  NaN  q2   12 - 8 = ?   [list]   4        NaN        good example  6

I tried doing it using

file1 = open("json2.json")
data = json.load(file1)
df = pd.json_normalize(data, record_path=['quiz'])

But I am getting the following error -

TypeError: {'quiz': {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 + 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12', 'test_dict': {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}}, 'summary': 'good example', 'viewer rating': 6} has non list value {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 + 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12', 'test_dict': {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}} for path quiz. Must be list or null.

The issue is, it is not a list but a dictionary itself. So, I also tried doing this -

pd.json_normalize(data, max_level=2)

But, I am not getting the expected output. I am just getting one row. Can anybody give me some pointers?

You can use a list comprehension:

import pandas as pd
d = {'quiz': {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 + 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12', 'test_dict': {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}}, 'summary': 'good example', 'viewer rating': 6}
r = [{'quiz':a, q:q, **v, 'summary':d['summary'], 'viewer rating':d['viewer rating']}
          for a, b in d['quiz'].items() for q, v in b.items()]

df = pd.DataFrame(r)

Output:

    quiz   q1                                question  ... viewer rating                                   test_dict   q2
0  sport   q1  Which one is correct team name in NBA?  ...             6                                         NaN  NaN
1  maths   q1                               5 + 7 = ?  ...             6  {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}  NaN
2  maths  NaN                              12 - 8 = ?  ...             6                                         NaN   q2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM