Converting nested JSON into a flattened DataFrame

Question

I have a data load in nested json format and I want to get a nested dataframe

{
    "page_count": 21,
    "page_number": 1,
    "page_size": 300,
    "total_records": 6128,
    "registrants": [
        {
            "id": "23lnTNqyQ3qkthfghjgkk",
            "first_name": "HUGO",
            "last_name": "MACHA   ILLEN",
            "email": "hugreb@hotmail.com",
            "address": "",
            "city": "",
            "country": "",
            "zip": "",
            "state": "",
            "phone": "",
            "industry": "",
            "org": "",
            "job_title": "",
            "purchasing_time_frame": "",
            "role_in_purchase_process": "",
            "no_of_employees": "",
            "comments": "",
            "custom_questions": [
                {
                    "title": "Departamento/ Región",
                    "value": ""
                },
                {
                    "title": "Género",
                    "value": "Masculino"
                },
                {
                    "title": "Edad",
                    "value": "De 35 a 55 años"
                },
                {
                    "title": "Nivel de estudio",
                    "value": "Técnico / Superior"
                },
                {
                    "title": "¿Eres cliente de una entidad financiera?",
                    "value": "Si"
                },
                {
                    "title": "¿Tiene una empresa?",
                    "value": "Si"
                }

I use this funtion: pat is the file json

df = pd.json_normalize(json.loads(pat.explode("custom_questions").to_json(orient="records")))

but it didn't work for the output i wanted.

I want get someone like this:

在此处输入图像描述

I want the value of the "titles", that is, the keys" to be displayed as the column headers and the value of "value" as the data. As well as the image that is attached

Answer 1

you can use json_normalize:

df = pd.DataFrame(your_json['registrants']).explode('custom_questions').reset_index(drop=True)
df=df.join(pd.json_normalize(df.pop('custom_questions')))

#convert rows to columns. Set index first 17 columns. We will not use theese.
df=df.set_index(df.columns[0:17].to_list())
dfx=df.pivot_table(values='value',columns='title',aggfunc=list).apply(pd.Series.explode).reset_index(drop=True)

#we have duplicate rows. Drop them
df=df.reset_index().drop(['value','title'],axis=1).drop_duplicates().reset_index(drop=True)
df=df.join(dfx)
'''
|    | id                    | first_name   | last_name     | email              | address   | city   | country   | zip   | state   | phone   | industry   | org   | job_title   | purchasing_time_frame   | role_in_purchase_process   | no_of_employees   | comments   | Departamento/ Región   | Edad            | Género    | Nivel de estudio   | ¿Eres cliente de una entidad financiera?   | ¿Tiene una empresa?   |
|---:|:----------------------|:-------------|:--------------|:-------------------|:----------|:-------|:----------|:------|:--------|:--------|:-----------|:------|:------------|:------------------------|:---------------------------|:------------------|:-----------|:-----------------------|:----------------|:----------|:-------------------|:-------------------------------------------|:----------------------|
|  0 | 23lnTNqyQ3qkthfghjgkk | HUGO         | MACHA   ILLEN | hugreb@hotmail.com |           |        |           |       |         |         |            |       |             |                         |                            |                   |            |                        | De 35 a 55 años | Masculino | Técnico / Superior | Si                                         | Si                    |

'''

Converting nested JSON into a flattened DataFrame

Question

1 answers

solution1
0 ACCPTED 2022-11-21 16:27:17

Converting nested JSON into a flattened DataFrame

Question

1 answers

solution1 0 ACCPTED 2022-11-21 16:27:17

solution1
0 ACCPTED 2022-11-21 16:27:17