简体   繁体   中英

Converting nested JSON into a flattened DataFrame

I have a data load in nested json format and I want to get a nested dataframe

{
    "page_count": 21,
    "page_number": 1,
    "page_size": 300,
    "total_records": 6128,
    "registrants": [
        {
            "id": "23lnTNqyQ3qkthfghjgkk",
            "first_name": "HUGO",
            "last_name": "MACHA   ILLEN",
            "email": "hugreb@hotmail.com",
            "address": "",
            "city": "",
            "country": "",
            "zip": "",
            "state": "",
            "phone": "",
            "industry": "",
            "org": "",
            "job_title": "",
            "purchasing_time_frame": "",
            "role_in_purchase_process": "",
            "no_of_employees": "",
            "comments": "",
            "custom_questions": [
                {
                    "title": "Departamento/ Región",
                    "value": ""
                },
                {
                    "title": "Género",
                    "value": "Masculino"
                },
                {
                    "title": "Edad",
                    "value": "De 35 a 55 años"
                },
                {
                    "title": "Nivel de estudio",
                    "value": "Técnico / Superior"
                },
                {
                    "title": "¿Eres cliente de una entidad financiera?",
                    "value": "Si"
                },
                {
                    "title": "¿Tiene una empresa?",
                    "value": "Si"
                }

I use this funtion: pat is the file json

df = pd.json_normalize(json.loads(pat.explode("custom_questions").to_json(orient="records")))

but it didn't work for the output i wanted.

I want get someone like this:

在此处输入图像描述

I want the value of the "titles", that is, the keys" to be displayed as the column headers and the value of "value" as the data. As well as the image that is attached

you can use json_normalize:

df = pd.DataFrame(your_json['registrants']).explode('custom_questions').reset_index(drop=True)
df=df.join(pd.json_normalize(df.pop('custom_questions')))

#convert rows to columns. Set index first 17 columns. We will not use theese.
df=df.set_index(df.columns[0:17].to_list())
dfx=df.pivot_table(values='value',columns='title',aggfunc=list).apply(pd.Series.explode).reset_index(drop=True)

#we have duplicate rows. Drop them
df=df.reset_index().drop(['value','title'],axis=1).drop_duplicates().reset_index(drop=True)
df=df.join(dfx)
'''
|    | id                    | first_name   | last_name     | email              | address   | city   | country   | zip   | state   | phone   | industry   | org   | job_title   | purchasing_time_frame   | role_in_purchase_process   | no_of_employees   | comments   | Departamento/ Región   | Edad            | Género    | Nivel de estudio   | ¿Eres cliente de una entidad financiera?   | ¿Tiene una empresa?   |
|---:|:----------------------|:-------------|:--------------|:-------------------|:----------|:-------|:----------|:------|:--------|:--------|:-----------|:------|:------------|:------------------------|:---------------------------|:------------------|:-----------|:-----------------------|:----------------|:----------|:-------------------|:-------------------------------------------|:----------------------|
|  0 | 23lnTNqyQ3qkthfghjgkk | HUGO         | MACHA   ILLEN | hugreb@hotmail.com |           |        |           |       |         |         |            |       |             |                         |                            |                   |            |                        | De 35 a 55 años | Masculino | Técnico / Superior | Si                                         | Si                    |

'''

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM