如何为存储在嵌套 JSON 文件中的数据库模式中的每个表元数据（列名、类型、格式）创建 Pandas dataframe

Question

I have a JSON file which contains the meta data for tables held within a schema.我有一个 JSON 文件，其中包含模式中保存的表的元数据。

I would like to create a dataframe for each table defined within the JSON file ie Person, HomeAddress, Employment.我想为 JSON 文件中定义的每个表创建一个 dataframe，即 Person、HomeAddress、Employment。 The Person and Employment are at the same level, but HomeAddress in nested within Person. Person 和 Empty 处于同一级别，但 HomeAddress 嵌套在 Person 中。

eg dataframe(Person)例如数据框（人）

 Column_Name     Type     Format      Required
 Person_ID       Integer              Yes
 DateOfBirth     String   date-time   Yes
 ...........

The contents of the file is as follows;文件内容如下；

{
    "$id": "12121212",
    "type": "object",
    "properties": {
        "PersonId": {
            "type": "integer"
        },
        "Person": {
            "type": ["object", "null"],
            "properties": {
                "PersonId": {
                    "type": "integer"
                },
                "DateOfBirth": {
                    "type": "string",
                    "format": "date-time"
                },
                "DateOfBirthVerified": {
                    "type": "boolean"
                },
                "Sex": {
                    "type": ["string", "null"]
                },
                "Surname": {
                    "type": ["string", "null"]
                },
                "Initials": {
                    "type": ["string", "null"]
                },
                "Forenames": {
                    "type": ["string", "null"]
                },
                "Title": {
                    "type": ["string", "null"]
                },
                "NationalIdNumber": {
                    "type": ["string", "null"]
                },
                "HomeAddress": {
                    "type": ["object", "null"],
                    "properties": {
                        "EffectiveDate": {
                            "type": "string",
                            "format": "date-time"
                        },
                        "EndDate": {
                            "type": "string",
                            "format": "date-time"
                        },
                        "Category": {
                            "type": ["string", "null"]
                        },
                        "Line1": {
                            "type": ["string", "null"]
                        },
                        "Line2": {
                            "type": ["string", "null"]
                        },
                        "Line3": {
                            "type": ["string", "null"]
                        },
                        "Line4": {
                            "type": ["string", "null"]
                        },
                        "City": {
                            "type": ["string", "null"]
                        },
                        "County": {
                            "type": ["string", "null"]
                        },
                        "Country": {
                            "type": ["string", "null"]
                        },
                        "CareOfAddressee": {
                            "type": ["string", "null"]
                        },
                        "PostCode": {
                            "type": ["string", "null"]
                        },
                        "SuspectAddress": {
                            "type": "boolean"
                        },
                        "Overseas": {
                            "type": "boolean"
                        }
                    },
                    "required": ["EffectiveDate", "EndDate", "Category", "Line1", "Line2", "Line3", "Line4", "City", "County", "Country", "CareOfAddressee", "PostCode", "SuspectAddress", "Overseas"]
                }
            },
            "required": ["PersonId", "DateOfBirth", "DateOfBirthVerified", "Sex", "Surname", "Initials", "Forenames", "Title", "NationalIdNumber", "HomeAddress"]
        },
        "Employment": {
            "type": ["object", "null"],
            "properties": {
                "EmployeeReference": {
                    "type": ["string", "null"]
                },
                "DateFirstEmployed": {
                    "type": "string",
                    "format": "date-time"
                },
                "PayrollNumber": {
                    "type": ["string", "null"]
                }
            },
            "required": ["EmployeeReference", "DateFirstEmployed", "PayrollNumber"]
        }
    },
    "required": ["PersonId", "Person", "Employment"]
}

Answer 1

Let d be the dictionary of the file contents.令d为文件内容的字典。 Then you could address this recursively as follows:然后你可以递归地解决这个问题，如下所示：

import pandas as pd
import numpy as np

def get_props(d, required=[]):
    props = []
    for k, v in d.items():
        if isinstance(v, dict):
            if 'type' in v.keys():
                props.append({
                    'Column_Name': k,
                    'Format': v['format'] if 'format' in v.keys() else np.NaN,
                    'Type': v['type'] if isinstance(v['type'], str) else v['type'][0],
                    'Required': 'Yes' if k in required else 'No'
                })
            props.extend(get_props(v, required=d['required'] if 'required' in d else []))
    return props

df = pd.DataFrame(get_props(d))
print(df)

prints印刷

index指数	Column_Name列名	Format格式	Type类型	Required必需的
0 0	PersonId个人身份	NaN钠	integer integer	Yes是的
1 1	Person人	NaN钠	object object	Yes是的
2 2	PersonId个人身份	NaN钠	integer integer	Yes是的
3 3	DateOfBirth出生日期	date-time约会时间	string细绳	Yes是的
4 4	DateOfBirthVerified DateOfBirthVerified	NaN钠	boolean boolean	Yes是的
5 5	Sex性别	NaN钠	string细绳	Yes是的
6 6	Surname姓	NaN钠	string细绳	Yes是的
7 7	Initials缩写	NaN钠	string细绳	Yes是的
8 8	Forenames名字	NaN钠	string细绳	Yes是的
9 9	Title标题	NaN钠	string细绳	Yes是的
10 10	NationalIdNumber身份证号码	NaN钠	string细绳	Yes是的
11 11	HomeAddress家庭地址	NaN钠	object object	Yes是的
12 12	EffectiveDate生效日期	date-time约会时间	string细绳	Yes是的
13 13	EndDate结束日期	date-time约会时间	string细绳	Yes是的
14 14	Category类别	NaN钠	string细绳	Yes是的
15 15	Line1 1号线	NaN钠	string细绳	Yes是的
16 16	Line2 2号线	NaN钠	string细绳	Yes是的
17 17	Line3 3号线	NaN钠	string细绳	Yes是的
18 18	Line4 4号线	NaN钠	string细绳	Yes是的
19 19	City城市	NaN钠	string细绳	Yes是的
20 20	County县	NaN钠	string细绳	Yes是的
21 21	Country国家	NaN钠	string细绳	Yes是的
22 22	CareOfAddressee CareOfAddresse	NaN钠	string细绳	Yes是的
23 23	PostCode邮政编码	NaN钠	string细绳	Yes是的
24 24	SuspectAddress嫌疑人地址	NaN钠	boolean boolean	Yes是的
25 25	Overseas海外	NaN钠	boolean boolean	Yes是的
26 26	Employment就业	NaN钠	object object	Yes是的
27 27	EmployeeReference员工参考	NaN钠	string细绳	Yes是的
28 28	DateFirstEmployed就业日期	date-time约会时间	string细绳	Yes是的
29 29	PayrollNumber工资单号	NaN钠	string细绳	Yes是的

如何为存储在嵌套 JSON 文件中的数据库模式中的每个表元数据（列名、类型、格式）创建 Pandas dataframe

问题描述

1 个解决方案

解决方案1
0 2022-08-07 04:16:30

如何为存储在嵌套 JSON 文件中的数据库模式中的每个表元数据（列名、类型、格式）创建 Pandas dataframe

问题描述

1 个解决方案

解决方案1 0 2022-08-07 04:16:30

解决方案1
0 2022-08-07 04:16:30