简体   繁体   中英

How do I create a Pandas Dataframe from a dictionary containing a nested dictionary?

I am working on a project where I am getting JSON data from a GraphQL API. After receiving the data, I am using json.loads() on the data and then accessing parts of the JSON I need, which is then stored in a dictionary containing another dictionary. The dictionary is:

{'placement': 1, 'entrant': {'id': 8554498, 'name': 'Test'}}
{'placement': 2, 'entrant': {'id': 8559863, 'name': 'Test'}}
{'placement': 3, 'entrant': {'id': 8561463, 'name': 'Test'}}
{'placement': 4, 'entrant': {'id': 8559889, 'name': 'Test'}}
{'placement': 5, 'entrant': {'id': 8561608, 'name': 'Test'}}
{'placement': 5, 'entrant': {'id': 8560090, 'name': 'Test'}}
{'placement': 7, 'entrant': {'id': 8561639, 'name': 'Test'}}
{'placement': 7, 'entrant': {'id': 8561822, 'name': 'Test'}}
{'placement': 9, 'entrant': {'id': 8559993, 'name': 'Test'}}
{'placement': 9, 'entrant': {'id': 8561572, 'name': 'Test'}}

How could I create a Pandas Dataframe so that the columns are

placement |  id  |  name

and the values below those columns are the values associated with them from the dictionary? If I use just

pd.DataFrame()

the output is not as desired, so I tried to look up solutions that involved me iterating over the items in the dictionary but I was unsuccessful. Any help would be appreciated. Thank you.

You need to create proper dictionaries for pandas to create a dataframe. I'm assuming here you have a list of dicts called dictionaries.

pd.DataFrame(
    [
        {"placement": d["placement"], "id": d["entrant"]["id"], "name": d["entrant"]["name"]}
        for d in dictionaries
    ]
)

i would do something like this, maybe is not the most elegant solution, but it works. I asume that you have a list of every dict, cause you proportioned the dicts one by one

dList = [{'placement': 1, 'entrant': {'id': 8554498, 'name': 'Test'}},
{'placement': 2, 'entrant': {'id': 8559863, 'name': 'Test'}},
{'placement': 3, 'entrant': {'id': 8561463, 'name': 'Test'}},
{'placement': 4, 'entrant': {'id': 8559889, 'name': 'Test'}},
{'placement': 5, 'entrant': {'id': 8561608, 'name': 'Test'}},
{'placement': 5, 'entrant': {'id': 8560090, 'name': 'Test'}},
{'placement': 7, 'entrant': {'id': 8561639, 'name': 'Test'}},
{'placement': 7, 'entrant': {'id': 8561822, 'name': 'Test'}},
{'placement': 9, 'entrant': {'id': 8559993, 'name': 'Test'}},
{'placement': 9, 'entrant': {'id': 8561572, 'name': 'Test'}}]


#generate column Names I supose that you dont have writed this names to make more general the problem
d0 = dList[0]
columns = []
for key,val in d0.items():
    if not isinstance(val,dict):
        columns.append(key)
    else:
        for subkey,subval in val.items():
            columns.append(subkey)

#%% Here we are going to generate de data list (a list with a sublist for every dict
data = []
for d in dList:
    thisData = []
    for key,val in d.items():
        if not isinstance(val,dict):
            thisData.append(val)
        else:
            for subkey,subval in val.items():
                thisData.append(subval)
    data.append(thisData)


df = pd.DataFrame(data,columns=columns)

hope it works for you, if not let me know

Here is one way to do it by extracting a new DataFrame from inside the first one and merging it:

from itertools import chain

import pandas as pd

data = [
    [{"placement": 1, "entrant": {"id": 8554498, "name": "Test"}}],
    [{"placement": 2, "entrant": {"id": 8559863, "name": "Test"}}],
    [{"placement": 3, "entrant": {"id": 8561463, "name": "Test"}}],
    [{"placement": 4, "entrant": {"id": 8559889, "name": "Test"}}],
    [{"placement": 5, "entrant": {"id": 8561608, "name": "Test"}}],
    [{"placement": 5, "entrant": {"id": 8560090, "name": "Test"}}],
    [{"placement": 7, "entrant": {"id": 8561639, "name": "Test"}}],
    [{"placement": 7, "entrant": {"id": 8561822, "name": "Test"}}],
    [{"placement": 9, "entrant": {"id": 8559993, "name": "Test"}}],
    [{"placement": 9, "entrant": {"id": 8561572, "name": "Test"}}],
]

df = pd.DataFrame.from_dict(chain(*data))
result_df = pd.merge_asof(
    df.loc[:, df.columns != "entrant"],  # Get df without the "entrant" column
    df["entrant"].apply(pd.Series), left_index=True, right_index=True
)

Here is the result:

   placement       id  name
0          1  8554498  Test
1          2  8559863  Test
2          3  8561463  Test
3          4  8559889  Test
4          5  8561608  Test
5          5  8560090  Test
6          7  8561639  Test
7          7  8561822  Test
8          9  8559993  Test
9          9  8561572  Test

You cannot iterate over the df_data1 capturing all dictionaries because of the way json.loads() provides the data. In order to fix the structure of the dictionary as wanted, I suggest you do the following to add commas between the dictionaries by replacing any occurrences of "}{ " with "}, {" , and surround it with "[" and "]" . Suppose j your json string, then:

df_data1 = json.loads("[" + j.replace("}{", "}, {") + "]")

Now you df_data1 should look like this:

[{'placement': 1, 'entrant': {'id': 8554498, 'name': 'Test'}},
{'placement': 2, 'entrant': {'id': 8559863, 'name': 'Test'}},
{'placement': 3, 'entrant': {'id': 8561463, 'name': 'Test'}},
{'placement': 4, 'entrant': {'id': 8559889, 'name': 'Test'}},
{'placement': 5, 'entrant': {'id': 8561608, 'name': 'Test'}},
{'placement': 5, 'entrant': {'id': 8560090, 'name': 'Test'}},
{'placement': 7, 'entrant': {'id': 8561639, 'name': 'Test'}},
{'placement': 7, 'entrant': {'id': 8561822, 'name': 'Test'}},
{'placement': 9, 'entrant': {'id': 8559993, 'name': 'Test'}},
{'placement': 9, 'entrant': {'id': 8561572, 'name': 'Test'}}]

And now you can use @Thomas Q solution:

df= pd.DataFrame([
        {"placement": d["placement"], "id": d["entrant"]["id"], "name": d["entrant"]["name"]}
        for d in df_data1
        ]
)
df

    placement   id  name
0   1   8554498 Test
1   2   8559863 Test
2   3   8561463 Test
3   4   8559889 Test
4   5   8561608 Test
5   5   8560090 Test
6   7   8561639 Test
7   7   8561822 Test
8   9   8559993 Test
9   9   8561572 Test

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM