简体   繁体   English

python-将嵌套的json规范化为pandas数据帧

[英]python- normalize nested json to pandas dataframe

I'm trying to normalize some JSON to flatten it for a SQL table.我正在尝试规范一些JSON以将其展平为SQL表。 The problem I've come across is that everything I've read has a standard name for nested items but I am working with unique ids that I want to put in as values in my dataframe.我遇到的问题是,我读过的所有内容都有一个嵌套项目的标准名称,但我正在使用唯一的 ID,我想将这些 ID 作为值放入我的数据框中。

Here's a sample of the JSON .这是JSON的示例。

{
"data": {
    "138082239": [
        {
            "id": 275,
            "name": "Sue",
            "abbreviation": "SJ",
            "active": true,
            "changedByUserId": "11710250",
            "statusUpdated": "2020-11-23T18:48:28+00:00",
            "leadCreated": "2020-11-23T18:48:28+00:00",
            "leadModified": "2020-11-23T18:48:29+00:00"
        }
    ],
    "138082238": [
        {
            "id": 276,
            "name": "John",
            "abbreviation": "JC",
            "active": true,
            "changedByUserId": "11710250",
            "statusUpdated": "2020-11-23T18:48:25+00:00",
            "leadCreated": "2020-11-23T18:48:25+00:00",
            "leadModified": "2020-11-23T18:48:25+00:00"
        }
    ],

I want to flatten this and add the index title (ex: 138082239) as a value [LeadId] in my dataframe.我想展平它并在我的数据框中添加索引标题(例如:138082239)作为值[LeadId] When I try to use pd.json_normalize() I just get a bunch of columns titled;当我尝试使用pd.json_normalize()我只会得到一堆标题为的列; data.138082239, data.138082238, etc. data.138082239、data.138082238等

I'm using requests to pull this JSON from an API.我正在使用requests从 API 中提取此JSON


    r = requests.request("GET", url, data=payload, headers=headers)
    
    j = r.json()
    
    df = pd.json_normalize(j)

I want the dataframe to look like this:我希望数据框看起来像这样:

LeadId      id    name   abbreviation   active
138082239   275   Sue    SJ             TRUE
138082238   276   John   JC             TRUE

Thanks in advance for your help!在此先感谢您的帮助!

One way using dict comprehension with pandas.DataFrame.assign :使用dict理解与pandas.DataFrame.assign一种方法:

df = pd.concat([pd.DataFrame(v).assign(LeadId=k) for k, v in j["data"].items()])
print(df[["LeadId", "id", "name", "abbreviation", "active"]])

Output:输出:

      LeadId   id  name abbreviation  active
0  138082239  275   Sue           SJ    True
0  138082238  276  John           JC    True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM