[英]How to create a Pandas dataframe for each table meta data (Column Name, Type, Format) stored within a Database Schema in nested JSON file
I have a JSON file which contains the meta data for tables held within a schema.我有一个 JSON 文件,其中包含模式中保存的表的元数据。
I would like to create a dataframe for each table defined within the JSON file ie Person, HomeAddress, Employment.我想为 JSON 文件中定义的每个表创建一个 dataframe,即 Person、HomeAddress、Employment。 The Person and Employment are at the same level, but HomeAddress in nested within Person.
Person 和 Empty 处于同一级别,但 HomeAddress 嵌套在 Person 中。
eg dataframe(Person)例如数据框(人)
Column_Name Type Format Required
Person_ID Integer Yes
DateOfBirth String date-time Yes
...........
The contents of the file is as follows;文件内容如下;
{
"$id": "12121212",
"type": "object",
"properties": {
"PersonId": {
"type": "integer"
},
"Person": {
"type": ["object", "null"],
"properties": {
"PersonId": {
"type": "integer"
},
"DateOfBirth": {
"type": "string",
"format": "date-time"
},
"DateOfBirthVerified": {
"type": "boolean"
},
"Sex": {
"type": ["string", "null"]
},
"Surname": {
"type": ["string", "null"]
},
"Initials": {
"type": ["string", "null"]
},
"Forenames": {
"type": ["string", "null"]
},
"Title": {
"type": ["string", "null"]
},
"NationalIdNumber": {
"type": ["string", "null"]
},
"HomeAddress": {
"type": ["object", "null"],
"properties": {
"EffectiveDate": {
"type": "string",
"format": "date-time"
},
"EndDate": {
"type": "string",
"format": "date-time"
},
"Category": {
"type": ["string", "null"]
},
"Line1": {
"type": ["string", "null"]
},
"Line2": {
"type": ["string", "null"]
},
"Line3": {
"type": ["string", "null"]
},
"Line4": {
"type": ["string", "null"]
},
"City": {
"type": ["string", "null"]
},
"County": {
"type": ["string", "null"]
},
"Country": {
"type": ["string", "null"]
},
"CareOfAddressee": {
"type": ["string", "null"]
},
"PostCode": {
"type": ["string", "null"]
},
"SuspectAddress": {
"type": "boolean"
},
"Overseas": {
"type": "boolean"
}
},
"required": ["EffectiveDate", "EndDate", "Category", "Line1", "Line2", "Line3", "Line4", "City", "County", "Country", "CareOfAddressee", "PostCode", "SuspectAddress", "Overseas"]
}
},
"required": ["PersonId", "DateOfBirth", "DateOfBirthVerified", "Sex", "Surname", "Initials", "Forenames", "Title", "NationalIdNumber", "HomeAddress"]
},
"Employment": {
"type": ["object", "null"],
"properties": {
"EmployeeReference": {
"type": ["string", "null"]
},
"DateFirstEmployed": {
"type": "string",
"format": "date-time"
},
"PayrollNumber": {
"type": ["string", "null"]
}
},
"required": ["EmployeeReference", "DateFirstEmployed", "PayrollNumber"]
}
},
"required": ["PersonId", "Person", "Employment"]
}
Let d
be the dictionary of the file contents.令
d
为文件内容的字典。 Then you could address this recursively as follows:然后你可以递归地解决这个问题,如下所示:
import pandas as pd
import numpy as np
def get_props(d, required=[]):
props = []
for k, v in d.items():
if isinstance(v, dict):
if 'type' in v.keys():
props.append({
'Column_Name': k,
'Format': v['format'] if 'format' in v.keys() else np.NaN,
'Type': v['type'] if isinstance(v['type'], str) else v['type'][0],
'Required': 'Yes' if k in required else 'No'
})
props.extend(get_props(v, required=d['required'] if 'required' in d else []))
return props
df = pd.DataFrame(get_props(d))
print(df)
prints印刷
index![]() |
Column_Name![]() |
Format![]() |
Type![]() |
Required![]() |
---|---|---|---|---|
0 ![]() |
PersonId![]() |
NaN![]() |
integer ![]() |
Yes![]() |
1 ![]() |
Person![]() |
NaN![]() |
object ![]() |
Yes![]() |
2 ![]() |
PersonId![]() |
NaN![]() |
integer ![]() |
Yes![]() |
3 ![]() |
DateOfBirth![]() |
date-time![]() |
string![]() |
Yes![]() |
4 ![]() |
DateOfBirthVerified ![]() |
NaN![]() |
boolean ![]() |
Yes![]() |
5 ![]() |
Sex![]() |
NaN![]() |
string![]() |
Yes![]() |
6 ![]() |
Surname![]() |
NaN![]() |
string![]() |
Yes![]() |
7 ![]() |
Initials![]() |
NaN![]() |
string![]() |
Yes![]() |
8 ![]() |
Forenames![]() |
NaN![]() |
string![]() |
Yes![]() |
9 ![]() |
Title![]() |
NaN![]() |
string![]() |
Yes![]() |
10 ![]() |
NationalIdNumber![]() |
NaN![]() |
string![]() |
Yes![]() |
11 ![]() |
HomeAddress![]() |
NaN![]() |
object ![]() |
Yes![]() |
12 ![]() |
EffectiveDate![]() |
date-time![]() |
string![]() |
Yes![]() |
13 ![]() |
EndDate![]() |
date-time![]() |
string![]() |
Yes![]() |
14 ![]() |
Category![]() |
NaN![]() |
string![]() |
Yes![]() |
15 ![]() |
Line1 ![]() |
NaN![]() |
string![]() |
Yes![]() |
16 ![]() |
Line2 ![]() |
NaN![]() |
string![]() |
Yes![]() |
17 ![]() |
Line3 ![]() |
NaN![]() |
string![]() |
Yes![]() |
18 ![]() |
Line4 ![]() |
NaN![]() |
string![]() |
Yes![]() |
19 ![]() |
City![]() |
NaN![]() |
string![]() |
Yes![]() |
20 ![]() |
County![]() |
NaN![]() |
string![]() |
Yes![]() |
21 ![]() |
Country![]() |
NaN![]() |
string![]() |
Yes![]() |
22 ![]() |
CareOfAddressee ![]() |
NaN![]() |
string![]() |
Yes![]() |
23 ![]() |
PostCode![]() |
NaN![]() |
string![]() |
Yes![]() |
24 ![]() |
SuspectAddress![]() |
NaN![]() |
boolean ![]() |
Yes![]() |
25 ![]() |
Overseas![]() |
NaN![]() |
boolean ![]() |
Yes![]() |
26 ![]() |
Employment![]() |
NaN![]() |
object ![]() |
Yes![]() |
27 ![]() |
EmployeeReference![]() |
NaN![]() |
string![]() |
Yes![]() |
28 ![]() |
DateFirstEmployed![]() |
date-time![]() |
string![]() |
Yes![]() |
29 ![]() |
PayrollNumber![]() |
NaN![]() |
string![]() |
Yes![]() |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.