對於嵌套對象，json 標准化為 Dataframe，Python

Question

我正在嘗試使用規范化 function 將 json 轉換為使用 json_normalize 的數據幀。 這是我正在使用的 json

data = {
    "Parent":
[
        {
        "Attributes":
        [
            {
                "Values": [{
                    "Month": "Jan",
                    "Value": "100"
                }],
                "Id": "90",
                "CustId": "3"
            },
            {
                "Values": [{
                    "Month": "Jan",
                    "Value": "101"
                }],
                "Id": "88" 
            },
            {
                "Values": [{
                    "Month": "Jan",
                    "Value": "102"
                }],
                "Id": "89" 
            }
        ],
        "DId": "1234"
    },
    {
        "Attributes":
        [
            {
                "Values": [{
                    "Month": "Jan",
                    "Value": "200"
                }],
                "Id": "90",
                "CustId": "3"
            },
            {
                "Values": [{
                    "Month": "Jan",
                    "Value": "201"
                }],
                "Id": "88" 
            },
            {
                "Values": [{
                    "Month": "Jan",
                    "Value": "202"
                }],
                "Id": "89" 
            }
        ],
        "DId": "5678"
    }
]
}

這就是我試過的

print(type(data))
result = pd.json_normalize(data, record_path=['Parent',['Attributes']], max_level=2)
print(result.to_string())

它給出了結果，但是它缺少 DId 並且值列仍然是字典列表

這就是我想要實現的

任何如何完成它的指導將不勝感激。

Answer 1

您可以通過meta關鍵字參數指定元數據（ record_path上方的數據）（結合errors='ignore'用於不一定存在的元數據，如CustId ）。 例如

result = pd.json_normalize(
    data,
    record_path=['Parent', 'Attributes', 'Values'],
    meta=[
        ['Parent', 'DId'],
        ['Parent', 'Attributes', 'Id'],
        ['Parent', 'Attributes', 'CustId']
    ],
    errors='ignore'
)

結果是

  Month Value Parent.DId Parent.Attributes.Id Parent.Attributes.CustId
0   Jan   100       1234                   90                        3
1   Jan   101       1234                   88                      NaN
2   Jan   102       1234                   89                      NaN
3   Jan   200       5678                   90                        3
4   Jan   201       5678                   88                      NaN
5   Jan   202       5678                   89                      NaN

Answer 2

這是實現這一目標的一種方法，我認為 step1 和 step2 可以組合在一起，這需要對pd.json_normalize有更多的了解

#step1
df1=pd.json_normalize(
    data['Parent'],["Attributes","Values"]
)
#step2
df2=pd.json_normalize(
    data['Parent'],"Attributes","DId",
)
df2=df2.drop(['Values'], axis=1)

result=df2.join(df1).reindex(['DId','Id','CustId','Month','Value'], axis=1)\
.sort_values(by=['DId','Id']) \
.rename(columns={'Id':'Attr.Id','CustId':'Attr.CustId','Month':'Attr.Values.Month',
                'Value':'Attr.Values.value'
                })

結果：

對於嵌套對象，json 標准化為 Dataframe，Python

問題描述

2 個解決方案

解決方案1
1 2023-01-24 10:15:24

解決方案2
0 已采納 2023-01-24 04:44:15

對於嵌套對象，json 標准化為 Dataframe，Python

問題描述

2 個解決方案

解決方案1 1 2023-01-24 10:15:24

解決方案2 0 已采納 2023-01-24 04:44:15

解決方案1
1 2023-01-24 10:15:24

解決方案2
0 已采納 2023-01-24 04:44:15