简体   繁体   English

将 JSON output 转换为 dataframe 表中的 ZA7F5F35426B92378173B

[英]Converting JSON output to dataframe table in Python

I am using a REST api to get a JSON file.我正在使用 REST api 来获取 JSON 文件。 I want to convert this JSON file into a dataframe so that i can upload it to a database.我想将此 JSON 文件转换为 dataframe 以便我可以将其上传到数据库。

The format of the JSON looks like this: JSON 的格式如下所示:

{result:
{__campaign_id__:
{"campaign_id": __campaign_id__ //id of the campaign in platform
  "name": xxx, // Campaign name
  "creatives":{ // list of creatives in the campaign
   __creative_id__:{
     "creative_id":__creative_id__,  // id of the creative
     "name":xxx, //name of the creative
     "device":xxx, // 0- mobile | 1- Desktop | 2- Instream video | 3- Responsive
     "width": x, //height of the placement in px
     "height": y, //w idth of the placement in px
     "analytics": {
       __live_id__:{
                "dt":xxx, //date in unix timestamp
                "impressions":xxx, //number of tracked ads
                "viewable":xxx,  // number of viewable impressions
                "engagement":xxx, // number of impressions user interacted or viewed video at least 25%
                "engagement_novtr":xxx, // number of impressions user interacted
                "ctr_unique":xxx, //number of unique clicks (one click per one impression)
                "ctr": xxx,  // number of total clicks
                "dwell":xxx, // exposure time
                "videos":[
                    {
                        "dt": 1607558400, //date in unix timestamp. Same as above
                        "unique": xxx, //number of unique video views
                        "id": "Video 1", //name of the video
                        "vtr_0": xxx,   // number of impressions who started watching video 1
                        "vtr_25": xxx,  // number of impressions who watched first quartile of video 1
                        "vtr_50": xxx,  // number of impressions who watched second quartile of video 1
                        "vtr_75": xxx,  // number of impressions who watched third quartile of video 1
                        "vtr_100": xxx  // number of impressions who finished watching the video of video 1
                    }
                ]
            }
        }
    },
   },
  },
}
}

All i want to get out is the data is the most granular way possible, so like this (simplified):我想要得到的是数据是最细粒度的方式,所以像这样(简化):

campaignID   creativeID    device         analytics  
1           1            pc             1  
1           1            pc             2  
1           2            mobile         1 
1           2            pc             2  
2           4            pc             5  
2           4            mobile         6  
2           6            pc             7  
2           5            mobile         7   
3           8            pc             9    

And so on.等等。 Basically just split the data into new lines so that every new line represents the finest split of the data possible.基本上只是将数据拆分为新行,以便每个新行代表数据的最佳拆分。 If that makes sense.如果这是有道理的。

My request looks like我的请求看起来像

nexdReponse = requests.get("myURL", headers=call_headers)
json_nexdData = json.loads(nexdReponse.text)

Now i have my JSON in a dictionary with the JSON heirarchy.现在,我的 JSON 在字典中,具有 JSON 层次结构。 So a dictionary of nested dictionaries i need to convert to a dataframe.因此,我需要将嵌套字典的字典转换为 dataframe。

Then I've tried normalizing my data with pd.Dataframe.from_dict(pd.json_normalize(myData) but it doesn't do what i want.然后我尝试用pd.Dataframe.from_dict(pd.json_normalize(myData)规范化我的数据,但它没有做我想要的。

Is there a simple way or a library i can use for this or something?有没有一种简单的方法或库可以用于此或其他什么? Obviously i'm kind of new to this, so just trying to figure out how this works.显然我对此有点陌生,所以只是想弄清楚这是如何工作的。

The json file you uploaded is invalid,so i'm posting a random valid one您上传的 json 文件无效,所以我发布了一个随机有效的文件

{
    "status": {
        "code": 200,
        "message": "ok"
    },
    "pagination": {
        "page": 1,
        "count": 100,
        "total": 292
    },
    "products": [
    {
            "id": 143,
            "created_at": "2019-11-19T04:30:14.000Z",
            "updated_at": "2019-11-19T04:30:19.000Z",
            "blacklisted": false,
            "average_score": 4.76109,
            "total_reviews": 2051,
            "url": "https://go/kytatohbhi",
            "external_product_id": "123455",
            "name": "bolo azadi",
            "description": "kaisebhi",
            "product_specs": [
                {
                    "name": "shradha anusar",
                    "value": "dhania"
                }
            ],
            "category": {
                "id": 1,
                "name": "bandhkarobhai"
            },
            "products_group": {
                "id": 3518659,
                "display_name": "makkede"
            },
            "images": [
                {
                    "original": "https://haule",
                    "square": "https://bulle",
                    "facebook": "https://sulle",
                    "facebook_square": "https://lulle",
                    "kind": "image"
                }
            ]
        },
        {
            "id": 148,
            "created_at": "2019-11-19T04:30:14.000Z",
            "updated_at": "2019-11-19T04:30:19.000Z",
            "blacklisted": false,
            "average_score": 4.76109,
            "total_reviews": 2051,
            "url": "https://kytatohbhi",
            "external_product_id": "123455",
            "name": "kuch bhi....khuch bhi",
            "description": "kabhi alvida na kehna",
            "product_specs": [
                {
                    "name": "shradha anusar",
                    "value": "dhania"
                },
        {
                    "name": "swaad anusar",
                    "value": "namak"
                }
            ],
            "category": {
                "id": 1,
                "name": "bandhkarobhai"
            },
            "products_group": {
                "id": 3518659,
                "display_name": "makkede"
            },
            "images": [
                {
                    "original": "https://haule",
                    "square": "https://bulle",
                    "facebook": "https://sulle",
                    "facebook_square": "https://lulle",
                    "kind": "image"
                }
            ]
        }
    ]
}

once the json is read, you can save it a dataframe, note that key products is a list读取 json 后,您可以将其保存为 dataframe,注意关键products是一个列表

df = pd.json_normalize(data['products'])

then you can use an explode function to split rest of the columns which are nested然后您可以使用explode function 拆分嵌套列的 rest

def explode_node(child_df, column_value):
    child_df = child_df.dropna(subset=[column_value])
    if isinstance(child_df[str(column_value)].iloc[0], str):
        child_df[column_value] = child_df[str(column_value)].apply(ast.literal_eval)
    expanded_child_df = (pd.concat({i: json_normalize(x) for i, x in child_df.pop(str(column_value)).items()}).reset_index(level=1,drop=True).join(child_df, how='right', lsuffix='_left', rsuffix='_right').reset_index(drop=True))
    expanded_child_df.columns = map(str.lower, expanded_child_df.columns)
    return expanded_child_df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM