Hello StackOverflow Community,
I am kind of new to this whole JSON file format and also a beginner in Python. I have a custom JSON file whose data is nested and sub-nested. I am trying to convert it into a tabular format using Python.
I exactly don't even know how to proceed, I referred some questions here and they flattened the file and then did some coding in Python to bring it into shape. I tried and it didn't exactly go well.
I will add the JSON file and the expected output in CSV format. Please have a look at and it and let me know of any ideas that I can use to make it work.
In the below JSON file, for each Company, we have specific Product IDs that have data and some are null. The ProductIDs which have data have an assigned value in predictions and the rest of them will be NULL.
I would like to also get some suggestions if we could modify the JSON data by adding a couple more identifiers or removing any identifiers would help us acheive the end output? (For example, add Product Value and Product Description for all of the sub-categories whereever we only have a value)
[
{
"predictions": {
"CC__108": 0.948093,
"CC__111": 0.897565
},
"CompanyID": "CI10001",
"SHAP_ACT_CC__1": null,
"SHAP_ACT_CC__2": null,
"SHAP_ACT_CC__108": {
"Product Category": "Toys",
"Board Games": {
"Monopoly": 35.99,
"The Game of Life": 39.99,
"The Clue": 20.45
},
"Soft Toys": {
"Bear": {
"Product Value": 5.78,
"Product Description": "A soft bear toy"
}
},
"Electronic Toys": {
"Digital pet": 59.99,
"Entertainment robot": 100.99
},
"Puzzle": {
"Baloon Puzzle": 10.99
},
"Rubik's Cube": {
"3x3 cube": {
"Product Value": 5.99,
"Product Description": "3x3 rubik's cube"
}
}
},
"SHAP_ACT_CC__109": null,
"SHAP_ACT_CC__110": null,
"SHAP_ACT_CC__111": {
"Product Category": "Books",
"Action and Adventure": {
"Life of Pi": 14.99,
"The Call of the Wild": 9.99
},
"Classics": {
"Little Women": {
"Product Value": 10.99,
"Product Description": "Illustrated Edition"
},
"Beloved": {
"Product Value": 12.99,
"Product Description": "Winner of the Nobel Prize"
}
},
"Comics": {
"Watchmen": 14.99,
"Avengers": 15.99
},
"Fantasy": {
"Ninth House": 18.99
},
"Historical": {}
},
"SHAP_ACT_CC__115": null,
"SHAP_ACT_CC__116": null
},
{
"predictions": {
"CC__124": 0.81234,
"CC__85": 0.78943
},
"CompanyID": "CI10002",
"SHAP_ACT_CC__18": null,
"SHAP_ACT_CC__24": null,
"SHAP_ACT_CC__124": {
"Product Category": "Vehicles",
"Military Aircraft": {
"Attack Airplanes": 10000.99,
"Bomber Airplanes": 15000.99
},
"Airplanes": {
"Cargo Airplanes": {
"Product Value": 20000.99,
"Product Description": "Cargo Transport"
}
},
"ATV": {},
"Automobiles": {},
"Bicycles": {}
},
"SHAP_ACT_CC__134": null,
"SHAP_ACT_CC__135": null,
"SHAP_ACT_CC__85": {
"Product Category": "Boats",
"Fishing Boats": {
"Smudger": 25000.99,
"Campion": 30000.99
},
"Dinghy Boats": {
"Lowe": {
"Product Value": 10000.99,
"Product Description": "lowes dinghy boat"
},
"Pond King": {
"Product Value": 8000.99,
"Product Description": "king of the pond"
}
},
"Deck Boats": {
"Sea Ark": 45000.99
},
"Bowrider Boats": {},
"House Boats": {
"World Cat": {
"Product Value": 15000.99,
"Product Description": "3 bedroom house boat"
}
}
},
"SHAP_ACT_CC__149": null,
"SHAP_ACT_CC__150": null
}
]
Output Data (CSV)
CompanyID,Product ID,Product Category,Product Type,Product Name,Product Value,Product Description
CI10001,SHAP_ACT_CC__108,Toys,Board Games,Monopoly,35.99,
CI10001,SHAP_ACT_CC__108,Toys,Board Games,The Game of Life,39.99,
CI10001,SHAP_ACT_CC__108,Toys,Board Games,The Clue,20.45,
CI10001,SHAP_ACT_CC__108,Toys,Soft Toys,Bear,5.78,
CI10001,SHAP_ACT_CC__108,Toys,Electronic Toys,Digital pet,59.99,
CI10001,SHAP_ACT_CC__108,Toys,Electronic Toys,Entertainment robot,100.99,
CI10001,SHAP_ACT_CC__108,Toys,Puzzle,Baloon Puzzle,10.99,
CI10001,SHAP_ACT_CC__108,Toys,Rubik's Cube,3x3 cube,5.99,3x3 rubik's cube
CI10001,SHAP_ACT_CC__111,Books,Action and Adventure,Life of Pi,14.99,
CI10001,SHAP_ACT_CC__111,Books,Action and Adventure,The Call of the Wild,9.99,
CI10001,SHAP_ACT_CC__111,Books,Classics,Little Women,10.99,Illustrated Edition
CI10001,SHAP_ACT_CC__111,Books,Classics,Beloved,12.99,Winner of the Nobel Prize
CI10001,SHAP_ACT_CC__111,Books,Comics,Watchmen,14.99,
CI10001,SHAP_ACT_CC__111,Books,Comics,Avengers,15.99,
CI10001,SHAP_ACT_CC__111,Books,Fantasy,Ninth House,18.99,
CI10001,SHAP_ACT_CC__111,Books,Historical,,,
CI10002,SHAP_ACT_CC__124,Vehicles,Military Aircraft,Attack Airplanes,10000.99,
CI10002,SHAP_ACT_CC__124,Vehicles,Military Aircraft,Bomber Airplanes,15000.99,
CI10002,SHAP_ACT_CC__124,Vehicles,Airplanes,Cargo Airplanes,20000.99,Cargo Transport
CI10002,SHAP_ACT_CC__124,Vehicles,ATV,,,
CI10002,SHAP_ACT_CC__124,Vehicles,Automobiles,,,
CI10002,SHAP_ACT_CC__124,Vehicles,Bicycles,,,
CI10002,SHAP_ACT_CC__85,Boats,Fishing Boats,Smudger,25000.99,
CI10002,SHAP_ACT_CC__85,Boats,Fishing Boats,Campion,30000.99,
CI10002,SHAP_ACT_CC__85,Boats,Dinghy Boats,Lowe,10000.99,lowes dinghy boat
CI10002,SHAP_ACT_CC__85,Boats,Dinghy Boats,Pond King,8000.99,king of the pond
CI10002,SHAP_ACT_CC__85,Boats,Deck Boats,Sea Ark,45000.99,
CI10002,SHAP_ACT_CC__85,Boats,Bowrider Boats,,,
CI10002,SHAP_ACT_CC__85,Boats,House Boats,World Cat,15000.99,3 bedroom house boat
Thank you for your help. Any ideas or additional links would be helpful and please let me know if I need to add any additional information or change the format of the post.
There's no straightforward way to handle such idiosyncratic JSON schema. You need to work with the schema from the innermost items all the way up, handling each type of object.
The innermost field has the product details, which can be a single number representing the value
14.99
or an object with keys for value and description
{
"Product Value": 10.99,
"Product Description": "Illustrated Edition"
}
You can handle it like this:
def process_product(ptype):
if isinstance(ptype, dict):
value = ptype['Product Value']
description = ptype['Product Description']
else:
value = ptype
description = None
return value, description
Then come product categories, each with a special Product Category
key and multiple keys for product types
{
"Product Category": "Books",
"Action and Adventure": {
"Life of Pi": 14.99,
"The Call of the Wild": 9.99
},
"Classics": {...}
}
Which you can process with the following function:
def process_category(category):
for key in category:
if key == 'Product Category':
continue
else:
for product_type, product_data in category[key].items():
yield (
category['Product Category'],
key,
product_type,
*process_product(product_data),
)
The final object in the hierarchy is the company and each has a CompanyID
key and several product categories
{
"CompanyID": "CI10001",
"SHAP_ACT_CC__1": null,
"SHAP_ACT_CC__108": {
"Product Category": "Toys",
"Board Games": {...}
}
}
And the function works similarly to the previous ones:
def process_company(company):
for key, data in company.items():
if key.startswith('SHAP_ACT') and data is not None:
for category_data in process_category(data):
yield company['CompanyID'], key, *category_data
Now write a function to process all the records in the array:
def process_data(companies):
for company in companies:
for company_data in process_company(company):
yield {
'Company ID': company_data[0],
'Product ID': company_data[1],
'Product Category': company_data[2],
'Product Type': company_data[3],
'Product Name': company_data[4],
'Product Value': company_data[5],
'Product Description': company_data[6],
}
pd.DataFrame(list(process_data(data)))
Company ID Product ID Product Category Product Type Product Name Product Value Product Description
0 CI10001 SHAP_ACT_CC__108 Toys Board Games Monopoly 35.99 None
1 CI10001 SHAP_ACT_CC__108 Toys Board Games The Game of Life 39.99 None
2 CI10001 SHAP_ACT_CC__108 Toys Board Games The Clue 20.45 None
3 CI10001 SHAP_ACT_CC__108 Toys Soft Toys Bear 5.78 A soft bear toy
4 CI10001 SHAP_ACT_CC__108 Toys Electronic Toys Digital pet 59.99 None
5 CI10001 SHAP_ACT_CC__108 Toys Electronic Toys Entertainment robot 100.99 None
6 CI10001 SHAP_ACT_CC__108 Toys Puzzle Baloon Puzzle 10.99 None
7 CI10001 SHAP_ACT_CC__108 Toys Rubik's Cube 3x3 cube 5.99 3x3 rubik's cube
8 CI10001 SHAP_ACT_CC__111 Books Action and Adventure Life of Pi 14.99 None
9 CI10001 SHAP_ACT_CC__111 Books Action and Adventure The Call of the Wild 9.99 None
10 CI10001 SHAP_ACT_CC__111 Books Classics Little Women 10.99 Illustrated Edition
11 CI10001 SHAP_ACT_CC__111 Books Classics Beloved 12.99 Winner of the Nobel Prize
12 CI10001 SHAP_ACT_CC__111 Books Comics Watchmen 14.99 None
13 CI10001 SHAP_ACT_CC__111 Books Comics Avengers 15.99 None
14 CI10001 SHAP_ACT_CC__111 Books Fantasy Ninth House 18.99 None
15 CI10002 SHAP_ACT_CC__124 Vehicles Military Aircraft Attack Airplanes 10000.99 None
16 CI10002 SHAP_ACT_CC__124 Vehicles Military Aircraft Bomber Airplanes 15000.99 None
17 CI10002 SHAP_ACT_CC__124 Vehicles Airplanes Cargo Airplanes 20000.99 Cargo Transport
18 CI10002 SHAP_ACT_CC__85 Boats Fishing Boats Smudger 25000.99 None
19 CI10002 SHAP_ACT_CC__85 Boats Fishing Boats Campion 30000.99 None
20 CI10002 SHAP_ACT_CC__85 Boats Dinghy Boats Lowe 10000.99 lowes dinghy boat
21 CI10002 SHAP_ACT_CC__85 Boats Dinghy Boats Pond King 8000.99 king of the pond
22 CI10002 SHAP_ACT_CC__85 Boats Deck Boats Sea Ark 45000.99 None
23 CI10002 SHAP_ACT_CC__85 Boats House Boats World Cat 15000.99 3 bedroom house boat
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.