简体   繁体   中英

How to convert json to Pandas Dataframe with nested objects?

I am extracting some tweets and I am getting json ( json_response ) in return which looks something like this (I've added dummy IDs):

{
    "data": [
        {
            "author_id": "123456",
            "conversation_id": "7890",
            "created_at": "2020-03-01T23:59:58.000Z",
            "id": "12345678",
            "lang": "en",
            "public_metrics": {
                "like_count": 1,
                "quote_count": 2,
                "reply_count": 3,
                "retweet_count": 4
            },
            "referenced_tweets": [
                {
                    "id": "13664100",
                    "type": "retweeted"
                }
            ],
            "reply_settings": "everyone",
            "source": "Twitter for Android",
            "text": "This is a sample."
        }
],
"includes": {
        "users": [
            {
                "created_at": "2018-08-29T23:45:37.000Z",
                "description": "",
                "id": "7890123",
                "name": "Twitter user",
                "public_metrics": {
                    "followers_count": 1199,
                    "following_count": 1351,
                    "listed_count": 0,
                    "tweet_count": 52607
                },
                "username": "user_123",
                "verified": false
            }
]
}

I am trying to convert it into pandas dataframe using the following code:

import json
from pandas.io.json import json_normalize

df = pd.DataFrame.from_dict(pd.json_normalize(json_response['data']), orient='columns')

And it is giving me the output whose header is as follows:

conversation_id | text | source | reply_settings | referenced_tweets | id | created_at | lang | author_id | public_metrics.retweet_count | public_metrics.reply_count | public_metrics.like_count | public_metrics.quote_count | in_reply_to_user_id

except that I want to add username as a column in the df along with other columns. I'd like to add the column username among these columns and I don't know how to do that. Any guidance please?

IIUC you have a list of users dictionaries in json_response['data'] and json_response['include']['users'] . Why not create your own dictionary list from those two?

json_response = json.loads(response_raw)
your_dict_list = json_response['data']
for i, user in enumerate(json_response['includes']['users']):
    your_dict_list[i]['username'] = user['username']

df = pd.json_normalize(your_dict_list)

Output:

  author_id conversation_id                created_at        id lang  ...  username public_metrics.like_count public_metrics.quote_count public_metrics.reply_count public_metrics.retweet_count
0    123456            7890  2020-03-01T23:59:58.000Z  12345678   en  ...  user_123                         1                          2                          3                            4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM