I have a complicated nested json file.i need a generic code which flattens this nested file and stores the result in dataframe using either pyspark or pandas. Is it achievable and is their any generic code which works for any complicated nested json files?
I have added json in data variable. To import json file you can use
df = pd.read_json('data.json')
I have used json_normalize()
to flatten nested json data.
Deeply nested JSON structure that can be converted dataframe by passing the meta arguments to the json_normalize function as shown below.
import pandas as pd
data = [
{
"company": "Google",
"tagline": "Hello World",
"management": {"CEO": "ABC"},
"department": [
{"name": "Gmail", "revenue (bn)": 123},
{"name": "GCP", "revenue (bn)": 400},
{"name": "Google drive", "revenue (bn)": 600},
],
},
{
"company": "Microsoft",
"tagline": "This is text",
"management": {"CEO": "XYZ"},
"department": [
{"name": "Onedrive", "revenue (bn)": 13},
{"name": "Azure", "revenue (bn)": 300},
{"name": "Microsoft 365", "revenue (bn)": 300},
],
},
]
df = pd.json_normalize(
data, "department", ["company", "tagline", ["management", "CEO"]]
)
df
Output
Refere this article by jssuriyakumar
You can also refer this similar issue by calestini
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.