简体   繁体   中英

populate pandas DataFrame from JSON file

I want to iterate through a JSON file and populate a pandas DataFrame via iterating through specific key values of the JSON data.

import pandas as pd
json_data = [
{
    "name" : "Brad Green",
    "age" : "35",
    "address" : {
        "street" : "Nicol St. 16",
        "city" : "Manhatan"
    },
    "children" : ["Nati", "Madi"]
},

{
    "name" : "Sara Brown",
    "age" : "30",
    "address" : {
        "street" : "Adam St. 66",
        "city" : "New York"
    },
    "children" : "none" 
}
]

I don't want to simply add the data from json_data to the df like the code below:

df = pd.DataFrame(json_data, columns=['name', 'address', 'age'])

I wrote a for loop to iterate through the json_data and add the data to df_new :

df_new = pd.DataFrame(columns=['name','age','street','city'])

for i in range(len(json_data)):
    df_new = df_new.append({"name": json_data[i]})
...

I know that this for loop obviously can't get the 'name','age','street','city values from json_data but I couldn't manage to find a solution by looking at different posts here. Plus, I want to get the data from address values separately from this nested key value. I would appreciate it if anyone can help me with this issue.

Iterating through the json file is maybe not the best way to do this IMO. I'd look into pd.json_normalize if I were you:

>>> df = pd.json_normalize(json_data)[['name', 'age', 'address.street', 'address.city']] 
>>> df

         name age address.street address.city
0  Brad Green  35   Nicol St. 16     Manhatan
1  Sara Brown  30    Adam St. 66     New York

You can rename the columns as you see fit after this, eg

df.columns = ["name", "age", "street", "city"]

Fixing your code

I wouldn't recommend this method, but to address your specific question about iterating through the json: you could get the dataframe using something like:

df_new = pd.DataFrame(columns=['name','age','street','city'])

for kv in json_data2:
    df_new = df_new.append(
        {
            "name": kv.get("name", None),
            "age": kv.get("age", None),
            "street": kv.get("address", {}).get("street", None),
            "city": kv.get("address", {}).get("city", None),
        },
        ignore_index=True
    )

Note I'm using .get() with a default value of None so that this won't fail if you have an inconsistantly structured record in your json (eg missing one of the required keys)

Use collections.defaultdict to collate the data while iterating, before creating the dataframe, should be more efficient that using json.normalize :

from collections import defaultdict

df = defaultdict(list)

for entry in json_data:
     df['name'].append(entry['name'])
     df['age'].append(entry['age'])
     df['street'].append(entry['address']['street'])
     df['city'].append(entry['address']['city'])

df

defaultdict(list,
            {'name': ['Brad Green', 'Sara Brown'],
             'age': ['35', '30'],
             'street': ['Nicol St. 16', 'Adam St. 66'],
             'city': ['Manhatan', 'New York']})

Create dataframe:

pd.DataFrame(df)
 
         name age        street      city
0  Brad Green  35  Nicol St. 16  Manhatan
1  Sara Brown  30   Adam St. 66  New York

You could also use jmespath , but I think it is overkill, as the json here is not too nested; still I'll add it here, in case you ever need to traverse nested json data. Summary for using jmespath , if it is a key, access with a . , if it is an array, access with [] :

import jmespath

df = jmespath.search("""{name:[].name, 
                         age: [].age, 
                         street: [].address.street, 
                         city: [].address.city}
                     """, 
                     json_data)

pd.DataFrame(df)
 
         name age        street      city
0  Brad Green  35  Nicol St. 16  Manhatan
1  Sara Brown  30   Adam St. 66  New York

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM