简体   繁体   中英

How to change into dataframe from nested json response coming from api

{
    "reviews": [
        {
            "reviewId": "12a3",
            "authorName": "Muhammad Arifin",
            "comments": [
                {
                    "userComment": {
                        "text": "\tsangat terbantu👍",
                        "lastModified": {
                            "seconds": "1606819245",
                            "nanos": 835000000
                        },
                        "starRating": 5,
                        "reviewerLanguage": "id",
                        "device": "1601",
                        "androidOsVersion": 23,
                        "appVersionCode": 20365,
                        "appVersionName": "5.2.73",
                        "deviceMetadata": {
                            "productName": "1601 (1601)",
                            "manufacturer": "Vivo",
                            "deviceClass": "FORM_FACTOR_PHONE",
                            "nativePlatform": "ABI_ARM64_V8,ABI_ARM_V7,ABI_ARM",
                            "cpuModel": "MT6750",
                            "cpuMake": "Mediatek"
                        }
                    }
                },
                {
                    "developerComment": {
                        "text": "Terima kasih sudah berbagi, kami sangat senang menjadi bagian dalam pejalanan travel anda!",
                        "lastModified": {
                            "seconds": "1606818598",
                            "nanos": 722000000
                        }
                    }
                }
            ]
        }
    ]
    "tokenPagination": {
        "nextPageToken": "abc"
    }
}

I want the column name as reviewId, authorName, userComment_text, userComment_lastModified, starRating, deviceMetadata.manufacturer, developerComment.text

I have tried this:

df=pd.json_normalize(fetch_reviews_response, record_path="reviews")

but it creates only reviewId, authorName and comments column

Please do try this repo and see if that works out.

It uses recursive functions to achieve this. The function in the 'json_to_csv.py' can be easily ported for your use by converting the flat json result into a dataframe by simply loading it using 'pandas.read_json'.

Firstly I reorganized the json file like below:

    {
"reviews": {

    "reviewId": "12a3",
    "authorName": "Muhammad Arifin",
    "comments": {
        "userComment": {
                "text": "\tsangat terbantu👍",
                "lastModified": {
                    "seconds": "1606819245",
                    "nanos": 835000000
                },
                "starRating": 5,
                "reviewerLanguage": "id",
                "device": "1601",
                "androidOsVersion": 23,
                "appVersionCode": 20365,
                "appVersionName": "5.2.73",
                "deviceMetadata": {
                    "productName": "1601 (1601)",
                    "manufacturer": "Vivo",
                    "deviceClass": "FORM_FACTOR_PHONE",
                    "nativePlatform": "ABI_ARM64_V8,ABI_ARM_V7,ABI_ARM",
                    "cpuModel": "MT6750",
                    "cpuMake": "Mediatek"
                }
            },

            "developerComment": {
                "text": "Terima kasih sudah berbagi, kami sangat senang menjadi bagian dalam pejalanan travel anda!",
                "lastModified": {
                    "seconds": "1606818598",
                    "nanos": 722000000
                }
            }
        }


,
"tokenPagination": {
    "nextPageToken": "abc"
}
}
}

Then in a python file I applied some pandas functionality in order to manipulate the dataframe.

import pandas as pd

df = pd.read_json("data.json")
df['reviewId'] = df['reviews']['reviewId']
df['authorName'] = df['reviews']['authorName']
df['userComment_text'] = df['reviews']['comments']['userComment']['text']
df['userComment_lastModified'] = df['reviews']['comments']['userComment']['lastModified']['seconds']
df['starRating'] = df['reviews']['comments']['userComment']['starRating']
df['deviceMetadata.manufacturer'] = df['reviews']['comments']['userComment']['deviceMetadata']['manufacturer']
df['developerComment.text'] = df['reviews']['comments']['developerComment']['text']



print(df.head())

And here is the my output:

                                                           reviews  ...                              developerComment.text
authorName                                         Muhammad Arifin  ...  Terima kasih sudah berbagi, kami sangat senang...
comments         {'userComment': {'text': ' sangat terbantu👍', ...  ...  Terima kasih sudah berbagi, kami sangat senang...
reviewId                                                      12a3  ...  Terima kasih sudah berbagi, kami sangat senang...
tokenPagination                           {'nextPageToken': 'abc'}  ...  Terima kasih sudah berbagi, kami sangat senang...

Meanwhile, you can change the rows as you wish. I did not edit them since you did not give any information about the rows.

I hope it works for you

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM