populate pandas DataFrame from JSON file

Question

I want to iterate through a JSON file and populate a pandas DataFrame via iterating through specific key values of the JSON data.

import pandas as pd
json_data = [
{
    "name" : "Brad Green",
    "age" : "35",
    "address" : {
        "street" : "Nicol St. 16",
        "city" : "Manhatan"
    },
    "children" : ["Nati", "Madi"]
},

{
    "name" : "Sara Brown",
    "age" : "30",
    "address" : {
        "street" : "Adam St. 66",
        "city" : "New York"
    },
    "children" : "none" 
}
]

I don't want to simply add the data from json_data to the df like the code below:

df = pd.DataFrame(json_data, columns=['name', 'address', 'age'])

I wrote a for loop to iterate through the json_data and add the data to df_new :

df_new = pd.DataFrame(columns=['name','age','street','city'])

for i in range(len(json_data)):
    df_new = df_new.append({"name": json_data[i]})
...

I know that this for loop obviously can't get the 'name','age','street','city values from json_data but I couldn't manage to find a solution by looking at different posts here. Plus, I want to get the data from address values separately from this nested key value. I would appreciate it if anyone can help me with this issue.

Answer 1

Iterating through the json file is maybe not the best way to do this IMO. I'd look into pd.json_normalize if I were you:

>>> df = pd.json_normalize(json_data)[['name', 'age', 'address.street', 'address.city']] 
>>> df

         name age address.street address.city
0  Brad Green  35   Nicol St. 16     Manhatan
1  Sara Brown  30    Adam St. 66     New York

You can rename the columns as you see fit after this, eg

df.columns = ["name", "age", "street", "city"]

Fixing your code

I wouldn't recommend this method, but to address your specific question about iterating through the json: you could get the dataframe using something like:

df_new = pd.DataFrame(columns=['name','age','street','city'])

for kv in json_data2:
    df_new = df_new.append(
        {
            "name": kv.get("name", None),
            "age": kv.get("age", None),
            "street": kv.get("address", {}).get("street", None),
            "city": kv.get("address", {}).get("city", None),
        },
        ignore_index=True
    )

Note I'm using .get() with a default value of None so that this won't fail if you have an inconsistantly structured record in your json (eg missing one of the required keys)

Answer 2

Use collections.defaultdict to collate the data while iterating, before creating the dataframe, should be more efficient that using json.normalize :

from collections import defaultdict

df = defaultdict(list)

for entry in json_data:
     df['name'].append(entry['name'])
     df['age'].append(entry['age'])
     df['street'].append(entry['address']['street'])
     df['city'].append(entry['address']['city'])

df

defaultdict(list,
            {'name': ['Brad Green', 'Sara Brown'],
             'age': ['35', '30'],
             'street': ['Nicol St. 16', 'Adam St. 66'],
             'city': ['Manhatan', 'New York']})

Create dataframe:

pd.DataFrame(df)
 
         name age        street      city
0  Brad Green  35  Nicol St. 16  Manhatan
1  Sara Brown  30   Adam St. 66  New York

You could also use jmespath , but I think it is overkill, as the json here is not too nested; still I'll add it here, in case you ever need to traverse nested json data. Summary for using jmespath , if it is a key, access with a . , if it is an array, access with [] :

import jmespath

df = jmespath.search("""{name:[].name, 
                         age: [].age, 
                         street: [].address.street, 
                         city: [].address.city}
                     """, 
                     json_data)

pd.DataFrame(df)
 
         name age        street      city
0  Brad Green  35  Nicol St. 16  Manhatan
1  Sara Brown  30   Adam St. 66  New York

populate pandas DataFrame from JSON file

Question

2 answers

solution1
2 2021-04-19 19:30:35

solution2
0 2021-04-19 23:47:46

populate pandas DataFrame from JSON file

Question

2 answers

solution1 2 2021-04-19 19:30:35

solution2 0 2021-04-19 23:47:46

solution1
2 2021-04-19 19:30:35

solution2
0 2021-04-19 23:47:46