简体   繁体   English

如何为存储在嵌套 JSON 文件中的数据库模式中的每个表元数据(列名、类型、格式)创建 Pandas dataframe

[英]How to create a Pandas dataframe for each table meta data (Column Name, Type, Format) stored within a Database Schema in nested JSON file

I have a JSON file which contains the meta data for tables held within a schema.我有一个 JSON 文件,其中包含模式中保存的表的元数据。

I would like to create a dataframe for each table defined within the JSON file ie Person, HomeAddress, Employment.我想为 JSON 文件中定义的每个表创建一个 dataframe,即 Person、HomeAddress、Employment。 The Person and Employment are at the same level, but HomeAddress in nested within Person. Person 和 Empty 处于同一级别,但 HomeAddress 嵌套在 Person 中。

eg dataframe(Person)例如数据框(人)

 Column_Name     Type     Format      Required
 Person_ID       Integer              Yes
 DateOfBirth     String   date-time   Yes
 ........... 

The contents of the file is as follows;文件内容如下;

{
    "$id": "12121212",
    "type": "object",
    "properties": {
        "PersonId": {
            "type": "integer"
        },
        "Person": {
            "type": ["object", "null"],
            "properties": {
                "PersonId": {
                    "type": "integer"
                },
                "DateOfBirth": {
                    "type": "string",
                    "format": "date-time"
                },
                "DateOfBirthVerified": {
                    "type": "boolean"
                },
                "Sex": {
                    "type": ["string", "null"]
                },
                "Surname": {
                    "type": ["string", "null"]
                },
                "Initials": {
                    "type": ["string", "null"]
                },
                "Forenames": {
                    "type": ["string", "null"]
                },
                "Title": {
                    "type": ["string", "null"]
                },
                "NationalIdNumber": {
                    "type": ["string", "null"]
                },
                "HomeAddress": {
                    "type": ["object", "null"],
                    "properties": {
                        "EffectiveDate": {
                            "type": "string",
                            "format": "date-time"
                        },
                        "EndDate": {
                            "type": "string",
                            "format": "date-time"
                        },
                        "Category": {
                            "type": ["string", "null"]
                        },
                        "Line1": {
                            "type": ["string", "null"]
                        },
                        "Line2": {
                            "type": ["string", "null"]
                        },
                        "Line3": {
                            "type": ["string", "null"]
                        },
                        "Line4": {
                            "type": ["string", "null"]
                        },
                        "City": {
                            "type": ["string", "null"]
                        },
                        "County": {
                            "type": ["string", "null"]
                        },
                        "Country": {
                            "type": ["string", "null"]
                        },
                        "CareOfAddressee": {
                            "type": ["string", "null"]
                        },
                        "PostCode": {
                            "type": ["string", "null"]
                        },
                        "SuspectAddress": {
                            "type": "boolean"
                        },
                        "Overseas": {
                            "type": "boolean"
                        }
                    },
                    "required": ["EffectiveDate", "EndDate", "Category", "Line1", "Line2", "Line3", "Line4", "City", "County", "Country", "CareOfAddressee", "PostCode", "SuspectAddress", "Overseas"]
                }
            },
            "required": ["PersonId", "DateOfBirth", "DateOfBirthVerified", "Sex", "Surname", "Initials", "Forenames", "Title", "NationalIdNumber", "HomeAddress"]
        },
        "Employment": {
            "type": ["object", "null"],
            "properties": {
                "EmployeeReference": {
                    "type": ["string", "null"]
                },
                "DateFirstEmployed": {
                    "type": "string",
                    "format": "date-time"
                },
                "PayrollNumber": {
                    "type": ["string", "null"]
                }
            },
            "required": ["EmployeeReference", "DateFirstEmployed", "PayrollNumber"]
        }
    },
    "required": ["PersonId", "Person", "Employment"]
}

Let d be the dictionary of the file contents.d为文件内容的字典。 Then you could address this recursively as follows:然后你可以递归地解决这个问题,如下所示:

import pandas as pd
import numpy as np

def get_props(d, required=[]):
    props = []
    for k, v in d.items():
        if isinstance(v, dict):
            if 'type' in v.keys():
                props.append({
                    'Column_Name': k,
                    'Format': v['format'] if 'format' in v.keys() else np.NaN,
                    'Type': v['type'] if isinstance(v['type'], str) else v['type'][0],
                    'Required': 'Yes' if k in required else 'No'
                })
            props.extend(get_props(v, required=d['required'] if 'required' in d else []))
    return props

df = pd.DataFrame(get_props(d))
print(df)

prints印刷

index指数 Column_Name列名 Format格式 Type类型 Required必需的
0 0 PersonId个人身份 NaN integer integer Yes是的
1 1 Person NaN object object Yes是的
2 2 PersonId个人身份 NaN integer integer Yes是的
3 3 DateOfBirth出生日期 date-time约会时间 string细绳 Yes是的
4 4 DateOfBirthVerified DateOfBirthVerified NaN boolean boolean Yes是的
5 5 Sex性别 NaN string细绳 Yes是的
6 6 Surname NaN string细绳 Yes是的
7 7 Initials缩写 NaN string细绳 Yes是的
8 8 Forenames名字 NaN string细绳 Yes是的
9 9 Title标题 NaN string细绳 Yes是的
10 10 NationalIdNumber身份证号码 NaN string细绳 Yes是的
11 11 HomeAddress家庭地址 NaN object object Yes是的
12 12 EffectiveDate生效日期 date-time约会时间 string细绳 Yes是的
13 13 EndDate结束日期 date-time约会时间 string细绳 Yes是的
14 14 Category类别 NaN string细绳 Yes是的
15 15 Line1 1号线 NaN string细绳 Yes是的
16 16 Line2 2号线 NaN string细绳 Yes是的
17 17 Line3 3号线 NaN string细绳 Yes是的
18 18 Line4 4号线 NaN string细绳 Yes是的
19 19 City城市 NaN string细绳 Yes是的
20 20 County NaN string细绳 Yes是的
21 21 Country国家 NaN string细绳 Yes是的
22 22 CareOfAddressee CareOfAddresse NaN string细绳 Yes是的
23 23 PostCode邮政编码 NaN string细绳 Yes是的
24 24 SuspectAddress嫌疑人地址 NaN boolean boolean Yes是的
25 25 Overseas海外 NaN boolean boolean Yes是的
26 26 Employment就业 NaN object object Yes是的
27 27 EmployeeReference员工参考 NaN string细绳 Yes是的
28 28 DateFirstEmployed就业日期 date-time约会时间 string细绳 Yes是的
29 29 PayrollNumber工资单号 NaN string细绳 Yes是的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从熊猫中的多个数据框创建Csv文件,并以数据框的名称作为每一列的标题? - How to create a Csv file from multiple dataframes in pandas with the name of the dataframe as a header of each column? 将Pandas DataFrame和元数据保存为JSON格式 - Saving Pandas DataFrame and meta-data to JSON format 为每个文件创建一个包含架构数据的数据框 - Create a dataframe containing schema data for each file Python - 如何从 Pandas Z6A8064B5DF47945557DZ0553 组创建 JSON 嵌套文件? - Python - How to create a JSON nested file from a Pandas dataframe and group by? 如何从嵌套的 json 文件创建 pandas dataframe? - How to create a pandas dataframe from a nested json file? 如何在 PySpark 中为嵌套的 JSON 列创建模式? - How to create schema for nested JSON column in PySpark? 如何从包含 json 的文件创建新的 pandas dataframe 列? - How to create new pandas dataframe column from file that contains json? 如何从 pandas DataFrame 创建嵌套的 JSON? - How to create a nested JSON from pandas DataFrame? Pandas - 在数据框中的列中展开嵌套的 json 数组 - Pandas - expand nested json array within column in dataframe 使用 S3 中的嵌套分区文件创建数据框,并在架构中加载具有分区列名称的数据框 - Create data frame using Nested partitioned file in S3 and load Data frame with partition column name in schema
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM