如何為存儲在嵌套 JSON 文件中的數據庫模式中的每個表元數據（列名、類型、格式）創建 Pandas dataframe

Question

我有一個 JSON 文件，其中包含模式中保存的表的元數據。

我想為 JSON 文件中定義的每個表創建一個 dataframe，即 Person、HomeAddress、Employment。 Person 和 Empty 處於同一級別，但 HomeAddress 嵌套在 Person 中。

例如數據框（人）

 Column_Name     Type     Format      Required
 Person_ID       Integer              Yes
 DateOfBirth     String   date-time   Yes
 ...........

文件內容如下；

{
    "$id": "12121212",
    "type": "object",
    "properties": {
        "PersonId": {
            "type": "integer"
        },
        "Person": {
            "type": ["object", "null"],
            "properties": {
                "PersonId": {
                    "type": "integer"
                },
                "DateOfBirth": {
                    "type": "string",
                    "format": "date-time"
                },
                "DateOfBirthVerified": {
                    "type": "boolean"
                },
                "Sex": {
                    "type": ["string", "null"]
                },
                "Surname": {
                    "type": ["string", "null"]
                },
                "Initials": {
                    "type": ["string", "null"]
                },
                "Forenames": {
                    "type": ["string", "null"]
                },
                "Title": {
                    "type": ["string", "null"]
                },
                "NationalIdNumber": {
                    "type": ["string", "null"]
                },
                "HomeAddress": {
                    "type": ["object", "null"],
                    "properties": {
                        "EffectiveDate": {
                            "type": "string",
                            "format": "date-time"
                        },
                        "EndDate": {
                            "type": "string",
                            "format": "date-time"
                        },
                        "Category": {
                            "type": ["string", "null"]
                        },
                        "Line1": {
                            "type": ["string", "null"]
                        },
                        "Line2": {
                            "type": ["string", "null"]
                        },
                        "Line3": {
                            "type": ["string", "null"]
                        },
                        "Line4": {
                            "type": ["string", "null"]
                        },
                        "City": {
                            "type": ["string", "null"]
                        },
                        "County": {
                            "type": ["string", "null"]
                        },
                        "Country": {
                            "type": ["string", "null"]
                        },
                        "CareOfAddressee": {
                            "type": ["string", "null"]
                        },
                        "PostCode": {
                            "type": ["string", "null"]
                        },
                        "SuspectAddress": {
                            "type": "boolean"
                        },
                        "Overseas": {
                            "type": "boolean"
                        }
                    },
                    "required": ["EffectiveDate", "EndDate", "Category", "Line1", "Line2", "Line3", "Line4", "City", "County", "Country", "CareOfAddressee", "PostCode", "SuspectAddress", "Overseas"]
                }
            },
            "required": ["PersonId", "DateOfBirth", "DateOfBirthVerified", "Sex", "Surname", "Initials", "Forenames", "Title", "NationalIdNumber", "HomeAddress"]
        },
        "Employment": {
            "type": ["object", "null"],
            "properties": {
                "EmployeeReference": {
                    "type": ["string", "null"]
                },
                "DateFirstEmployed": {
                    "type": "string",
                    "format": "date-time"
                },
                "PayrollNumber": {
                    "type": ["string", "null"]
                }
            },
            "required": ["EmployeeReference", "DateFirstEmployed", "PayrollNumber"]
        }
    },
    "required": ["PersonId", "Person", "Employment"]
}

Answer 1

令d為文件內容的字典。 然后你可以遞歸地解決這個問題，如下所示：

import pandas as pd
import numpy as np

def get_props(d, required=[]):
    props = []
    for k, v in d.items():
        if isinstance(v, dict):
            if 'type' in v.keys():
                props.append({
                    'Column_Name': k,
                    'Format': v['format'] if 'format' in v.keys() else np.NaN,
                    'Type': v['type'] if isinstance(v['type'], str) else v['type'][0],
                    'Required': 'Yes' if k in required else 'No'
                })
            props.extend(get_props(v, required=d['required'] if 'required' in d else []))
    return props

df = pd.DataFrame(get_props(d))
print(df)

印刷

指數	列名	格式	類型	必需的
0	個人身份	鈉	integer	是的
1	人	鈉	object	是的
2	個人身份	鈉	integer	是的
3	出生日期	約會時間	細繩	是的
4	DateOfBirthVerified	鈉	boolean	是的
5	性別	鈉	細繩	是的
6	姓	鈉	細繩	是的
7	縮寫	鈉	細繩	是的
8	名字	鈉	細繩	是的
9	標題	鈉	細繩	是的
10	身份證號碼	鈉	細繩	是的
11	家庭地址	鈉	object	是的
12	生效日期	約會時間	細繩	是的
13	結束日期	約會時間	細繩	是的
14	類別	鈉	細繩	是的
15	1號線	鈉	細繩	是的
16	2號線	鈉	細繩	是的
17	3號線	鈉	細繩	是的
18	4號線	鈉	細繩	是的
19	城市	鈉	細繩	是的
20	縣	鈉	細繩	是的
21	國家	鈉	細繩	是的
22	CareOfAddresse	鈉	細繩	是的
23	郵政編碼	鈉	細繩	是的
24	嫌疑人地址	鈉	boolean	是的
25	海外	鈉	boolean	是的
26	就業	鈉	object	是的
27	員工參考	鈉	細繩	是的
28	就業日期	約會時間	細繩	是的
29	工資單號	鈉	細繩	是的

如何為存儲在嵌套 JSON 文件中的數據庫模式中的每個表元數據（列名、類型、格式）創建 Pandas dataframe

問題描述

1 個解決方案

解決方案1
0 2022-08-07 04:16:30

如何為存儲在嵌套 JSON 文件中的數據庫模式中的每個表元數據（列名、類型、格式）創建 Pandas dataframe

問題描述

1 個解決方案

解決方案1 0 2022-08-07 04:16:30

解決方案1
0 2022-08-07 04:16:30