I've got a large JSON file with the following structure:
{
"Project": {
"AAA": {
"Version": [
{
"id": "00001",
"name": "08.12.2019",
"description": null,
"released": true,
"releaseDate": "2019-08-12"
},
{
"id": "00002",
"name": "2019.8.26",
"description": null,
"released": true,
"releaseDate": "2019-08-26"
}
]
},
"BBB": {
"Version": [
{
"id": "00003",
"name": "AABBY3",
"description": "2019 Accounting Year End",
"released": false,
"releaseDate": null
},
{
"id": "00004",
"name": "AACCZ4",
"description": "Financial Statements 2019",
"released": false,
"releaseDate": null
},
{
"id": "00005",
"name": "AADDZ5",
"description": null,
"released": false,
"releaseDate": null
}
]
}
}
}
I'm having a problem converting this into a Python dataframe due to the nested array. How can I extract all the data in each Version
for each Project
, but maintaining a reference to Project
?
I've so far only managed to get a dataframe of the following structure:
df.head(3)
Out[10]:
description id name releaseDate released
0 Version 5.4.1. 10703 V5R4M1 2010-09-15 True
1 Version 5.5.1 10704 V5R5M1 2015-04-20 True
2 Version 6.1.1 10705 V6R1M1 2016-10-14 True
using the following:
with open("fixVer2.json", "r") as read_file:
data = json.load(read_file)
prj_list = ['AAA', 'BBB', 'CCC', 'DDD']
d_list = []
for x in prj_list:
d = data['Project'][x]['Version']
for el in d:
d_list.append(el)
df = pd.DataFrame(d_list)
but the due to some duplicate names
across projects with different releaseDates
, I need to keep the Project
name to identify the correct releaseDate
for each name
desired output:
description id name releaseDate released Project
0 Version 5.4.1. 10703 V5R4M1 2010-09-15 True CCC
1 Version 5.5.1 10704 V5R5M1 2015-04-20 True CCC
2 Version 6.1.1 10705 V6R1M1 2016-10-14 True CCC
I am unsure how I can parse the nested array, keep the Project
name detail and consolidate all of it into one dataframe/other Python structure
You can change append with added version in your solution:
d_list = []
for x in prj_list:
d = data['Project'][x]['Version']
for el in d:
el['Project'] = x
d_list.append(el)
Or use list comprehension:
prj_list = ['AAA', 'BBB']
d_list = [{**el, **{'version': x}} for x in prj_list for el in data['Project'][x]['Version']]
df = pd.DataFrame(d_list)
print (df)
id name description released releaseDate version
0 00001 08.12.2019 null True 2019-08-12 AAA
1 00002 2019.8.26 null True 2019-08-26 AAA
2 00003 AABBY3 2019 Accounting Year End False null BBB
3 00004 AACCZ4 Financial Statements 2019 False null BBB
4 00005 AADDZ5 null False null BBB
Try this:
import json
import pandas as pd
with open("test.json", "r") as read_file:
data = json.load(read_file)['Project']
d_list = []
for name,dat in data.items():
for d in dat['Version']:
d['Project']=name
d_list.append(d)
df = pd.DataFrame(d_list)
print(df)
Project description id name releaseDate released
0 AAA None 00001 08.12.2019 2019-08-12 True
1 AAA None 00002 2019.8.26 2019-08-26 True
2 BBB 2019 Accounting Year End 00003 AABBY3 None False
3 BBB Financial Statements 2019 00004 AACCZ4 None False
4 BBB None 00005 AADDZ5 None False
With this approach, you don't need to keep a separate list of projects. Hope this helps!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.