简体   繁体   English

如何使用嵌套对象将 json 转换为 Pandas Dataframe?

[英]How to convert json to Pandas Dataframe with nested objects?

I am extracting some tweets and I am getting json ( json_response ) in return which looks something like this (I've added dummy IDs):我正在提取一些推文,并得到 json ( json_response ) 作为回报,看起来像这样(我添加了虚拟 ID):

{
    "data": [
        {
            "author_id": "123456",
            "conversation_id": "7890",
            "created_at": "2020-03-01T23:59:58.000Z",
            "id": "12345678",
            "lang": "en",
            "public_metrics": {
                "like_count": 1,
                "quote_count": 2,
                "reply_count": 3,
                "retweet_count": 4
            },
            "referenced_tweets": [
                {
                    "id": "13664100",
                    "type": "retweeted"
                }
            ],
            "reply_settings": "everyone",
            "source": "Twitter for Android",
            "text": "This is a sample."
        }
],
"includes": {
        "users": [
            {
                "created_at": "2018-08-29T23:45:37.000Z",
                "description": "",
                "id": "7890123",
                "name": "Twitter user",
                "public_metrics": {
                    "followers_count": 1199,
                    "following_count": 1351,
                    "listed_count": 0,
                    "tweet_count": 52607
                },
                "username": "user_123",
                "verified": false
            }
]
}

I am trying to convert it into pandas dataframe using the following code:我正在尝试使用以下代码将其转换为 pandas dataframe :

import json
from pandas.io.json import json_normalize

df = pd.DataFrame.from_dict(pd.json_normalize(json_response['data']), orient='columns')

And it is giving me the output whose header is as follows:它给了我 output ,其 header 如下:

conversation_id | text | source | reply_settings | referenced_tweets | id | created_at | lang | author_id | public_metrics.retweet_count | public_metrics.reply_count | public_metrics.like_count | public_metrics.quote_count | in_reply_to_user_id

except that I want to add username as a column in the df along with other columns.除了我想将username添加为df中的一列以及其他列。 I'd like to add the column username among these columns and I don't know how to do that.我想在这些列中添加列username ,但我不知道该怎么做。 Any guidance please?请问有什么指导吗?

IIUC you have a list of users dictionaries in json_response['data'] and json_response['include']['users'] . IIUC 您在json_response['data']json_response['include']['users']中有一个用户字典列表。 Why not create your own dictionary list from those two?为什么不从这两个中创建自己的字典列表呢?

json_response = json.loads(response_raw)
your_dict_list = json_response['data']
for i, user in enumerate(json_response['includes']['users']):
    your_dict_list[i]['username'] = user['username']

df = pd.json_normalize(your_dict_list)

Output: Output:

  author_id conversation_id                created_at        id lang  ...  username public_metrics.like_count public_metrics.quote_count public_metrics.reply_count public_metrics.retweet_count
0    123456            7890  2020-03-01T23:59:58.000Z  12345678   en  ...  user_123                         1                          2                          3                            4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM