I want to iterate through a JSON file and populate a pandas DataFrame via iterating through specific key values of the JSON data.
import pandas as pd
json_data = [
{
"name" : "Brad Green",
"age" : "35",
"address" : {
"street" : "Nicol St. 16",
"city" : "Manhatan"
},
"children" : ["Nati", "Madi"]
},
{
"name" : "Sara Brown",
"age" : "30",
"address" : {
"street" : "Adam St. 66",
"city" : "New York"
},
"children" : "none"
}
]
I don't want to simply add the data from json_data
to the df like the code below:
df = pd.DataFrame(json_data, columns=['name', 'address', 'age'])
I wrote a for loop to iterate through the json_data
and add the data to df_new
:
df_new = pd.DataFrame(columns=['name','age','street','city'])
for i in range(len(json_data)):
df_new = df_new.append({"name": json_data[i]})
...
I know that this for loop obviously can't get the 'name','age','street','city
values from json_data
but I couldn't manage to find a solution by looking at different posts here. Plus, I want to get the data from address
values separately from this nested key value. I would appreciate it if anyone can help me with this issue.
Iterating through the json file is maybe not the best way to do this IMO. I'd look into pd.json_normalize
if I were you:
>>> df = pd.json_normalize(json_data)[['name', 'age', 'address.street', 'address.city']]
>>> df
name age address.street address.city
0 Brad Green 35 Nicol St. 16 Manhatan
1 Sara Brown 30 Adam St. 66 New York
You can rename the columns as you see fit after this, eg
df.columns = ["name", "age", "street", "city"]
Fixing your code
I wouldn't recommend this method, but to address your specific question about iterating through the json: you could get the dataframe using something like:
df_new = pd.DataFrame(columns=['name','age','street','city'])
for kv in json_data2:
df_new = df_new.append(
{
"name": kv.get("name", None),
"age": kv.get("age", None),
"street": kv.get("address", {}).get("street", None),
"city": kv.get("address", {}).get("city", None),
},
ignore_index=True
)
Note I'm using .get()
with a default value of None
so that this won't fail if you have an inconsistantly structured record in your json (eg missing one of the required keys)
Use collections.defaultdict to collate the data while iterating, before creating the dataframe, should be more efficient that using json.normalize :
from collections import defaultdict
df = defaultdict(list)
for entry in json_data:
df['name'].append(entry['name'])
df['age'].append(entry['age'])
df['street'].append(entry['address']['street'])
df['city'].append(entry['address']['city'])
df
defaultdict(list,
{'name': ['Brad Green', 'Sara Brown'],
'age': ['35', '30'],
'street': ['Nicol St. 16', 'Adam St. 66'],
'city': ['Manhatan', 'New York']})
Create dataframe:
pd.DataFrame(df)
name age street city
0 Brad Green 35 Nicol St. 16 Manhatan
1 Sara Brown 30 Adam St. 66 New York
You could also use jmespath , but I think it is overkill, as the json here is not too nested; still I'll add it here, in case you ever need to traverse nested json data. Summary for using jmespath , if it is a key, access with a .
, if it is an array, access with []
:
import jmespath
df = jmespath.search("""{name:[].name,
age: [].age,
street: [].address.street,
city: [].address.city}
""",
json_data)
pd.DataFrame(df)
name age street city
0 Brad Green 35 Nicol St. 16 Manhatan
1 Sara Brown 30 Adam St. 66 New York
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.