[英]How to create a Csv file from multiple dataframes in pandas with the name of the dataframe as a header of each column?
[英]How to create a Pandas dataframe for each table meta data (Column Name, Type, Format) stored within a Database Schema in nested JSON file
我有一个 JSON 文件,其中包含模式中保存的表的元数据。
我想为 JSON 文件中定义的每个表创建一个 dataframe,即 Person、HomeAddress、Employment。 Person 和 Empty 处于同一级别,但 HomeAddress 嵌套在 Person 中。
例如数据框(人)
Column_Name Type Format Required
Person_ID Integer Yes
DateOfBirth String date-time Yes
...........
文件内容如下;
{
"$id": "12121212",
"type": "object",
"properties": {
"PersonId": {
"type": "integer"
},
"Person": {
"type": ["object", "null"],
"properties": {
"PersonId": {
"type": "integer"
},
"DateOfBirth": {
"type": "string",
"format": "date-time"
},
"DateOfBirthVerified": {
"type": "boolean"
},
"Sex": {
"type": ["string", "null"]
},
"Surname": {
"type": ["string", "null"]
},
"Initials": {
"type": ["string", "null"]
},
"Forenames": {
"type": ["string", "null"]
},
"Title": {
"type": ["string", "null"]
},
"NationalIdNumber": {
"type": ["string", "null"]
},
"HomeAddress": {
"type": ["object", "null"],
"properties": {
"EffectiveDate": {
"type": "string",
"format": "date-time"
},
"EndDate": {
"type": "string",
"format": "date-time"
},
"Category": {
"type": ["string", "null"]
},
"Line1": {
"type": ["string", "null"]
},
"Line2": {
"type": ["string", "null"]
},
"Line3": {
"type": ["string", "null"]
},
"Line4": {
"type": ["string", "null"]
},
"City": {
"type": ["string", "null"]
},
"County": {
"type": ["string", "null"]
},
"Country": {
"type": ["string", "null"]
},
"CareOfAddressee": {
"type": ["string", "null"]
},
"PostCode": {
"type": ["string", "null"]
},
"SuspectAddress": {
"type": "boolean"
},
"Overseas": {
"type": "boolean"
}
},
"required": ["EffectiveDate", "EndDate", "Category", "Line1", "Line2", "Line3", "Line4", "City", "County", "Country", "CareOfAddressee", "PostCode", "SuspectAddress", "Overseas"]
}
},
"required": ["PersonId", "DateOfBirth", "DateOfBirthVerified", "Sex", "Surname", "Initials", "Forenames", "Title", "NationalIdNumber", "HomeAddress"]
},
"Employment": {
"type": ["object", "null"],
"properties": {
"EmployeeReference": {
"type": ["string", "null"]
},
"DateFirstEmployed": {
"type": "string",
"format": "date-time"
},
"PayrollNumber": {
"type": ["string", "null"]
}
},
"required": ["EmployeeReference", "DateFirstEmployed", "PayrollNumber"]
}
},
"required": ["PersonId", "Person", "Employment"]
}
令d
为文件内容的字典。 然后你可以递归地解决这个问题,如下所示:
import pandas as pd
import numpy as np
def get_props(d, required=[]):
props = []
for k, v in d.items():
if isinstance(v, dict):
if 'type' in v.keys():
props.append({
'Column_Name': k,
'Format': v['format'] if 'format' in v.keys() else np.NaN,
'Type': v['type'] if isinstance(v['type'], str) else v['type'][0],
'Required': 'Yes' if k in required else 'No'
})
props.extend(get_props(v, required=d['required'] if 'required' in d else []))
return props
df = pd.DataFrame(get_props(d))
print(df)
印刷
指数 | 列名 | 格式 | 类型 | 必需的 |
---|---|---|---|---|
0 | 个人身份 | 钠 | integer | 是的 |
1 | 人 | 钠 | object | 是的 |
2 | 个人身份 | 钠 | integer | 是的 |
3 | 出生日期 | 约会时间 | 细绳 | 是的 |
4 | DateOfBirthVerified | 钠 | boolean | 是的 |
5 | 性别 | 钠 | 细绳 | 是的 |
6 | 姓 | 钠 | 细绳 | 是的 |
7 | 缩写 | 钠 | 细绳 | 是的 |
8 | 名字 | 钠 | 细绳 | 是的 |
9 | 标题 | 钠 | 细绳 | 是的 |
10 | 身份证号码 | 钠 | 细绳 | 是的 |
11 | 家庭地址 | 钠 | object | 是的 |
12 | 生效日期 | 约会时间 | 细绳 | 是的 |
13 | 结束日期 | 约会时间 | 细绳 | 是的 |
14 | 类别 | 钠 | 细绳 | 是的 |
15 | 1号线 | 钠 | 细绳 | 是的 |
16 | 2号线 | 钠 | 细绳 | 是的 |
17 | 3号线 | 钠 | 细绳 | 是的 |
18 | 4号线 | 钠 | 细绳 | 是的 |
19 | 城市 | 钠 | 细绳 | 是的 |
20 | 县 | 钠 | 细绳 | 是的 |
21 | 国家 | 钠 | 细绳 | 是的 |
22 | CareOfAddresse | 钠 | 细绳 | 是的 |
23 | 邮政编码 | 钠 | 细绳 | 是的 |
24 | 嫌疑人地址 | 钠 | boolean | 是的 |
25 | 海外 | 钠 | boolean | 是的 |
26 | 就业 | 钠 | object | 是的 |
27 | 员工参考 | 钠 | 细绳 | 是的 |
28 | 就业日期 | 约会时间 | 细绳 | 是的 |
29 | 工资单号 | 钠 | 细绳 | 是的 |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.