[英]How to convert json to Pandas Dataframe with nested objects?
我正在提取一些推文,並得到 json ( json_response
) 作為回報,看起來像這樣(我添加了虛擬 ID):
{
"data": [
{
"author_id": "123456",
"conversation_id": "7890",
"created_at": "2020-03-01T23:59:58.000Z",
"id": "12345678",
"lang": "en",
"public_metrics": {
"like_count": 1,
"quote_count": 2,
"reply_count": 3,
"retweet_count": 4
},
"referenced_tweets": [
{
"id": "13664100",
"type": "retweeted"
}
],
"reply_settings": "everyone",
"source": "Twitter for Android",
"text": "This is a sample."
}
],
"includes": {
"users": [
{
"created_at": "2018-08-29T23:45:37.000Z",
"description": "",
"id": "7890123",
"name": "Twitter user",
"public_metrics": {
"followers_count": 1199,
"following_count": 1351,
"listed_count": 0,
"tweet_count": 52607
},
"username": "user_123",
"verified": false
}
]
}
我正在嘗試使用以下代碼將其轉換為 pandas dataframe :
import json
from pandas.io.json import json_normalize
df = pd.DataFrame.from_dict(pd.json_normalize(json_response['data']), orient='columns')
它給了我 output ,其 header 如下:
conversation_id | text | source | reply_settings | referenced_tweets | id | created_at | lang | author_id | public_metrics.retweet_count | public_metrics.reply_count | public_metrics.like_count | public_metrics.quote_count | in_reply_to_user_id
除了我想將username
添加為df
中的一列以及其他列。 我想在這些列中添加列username
,但我不知道該怎么做。 請問有什么指導嗎?
IIUC 您在json_response['data']
和json_response['include']['users']
中有一個用戶字典列表。 為什么不從這兩個中創建自己的字典列表呢?
json_response = json.loads(response_raw)
your_dict_list = json_response['data']
for i, user in enumerate(json_response['includes']['users']):
your_dict_list[i]['username'] = user['username']
df = pd.json_normalize(your_dict_list)
Output:
author_id conversation_id created_at id lang ... username public_metrics.like_count public_metrics.quote_count public_metrics.reply_count public_metrics.retweet_count
0 123456 7890 2020-03-01T23:59:58.000Z 12345678 en ... user_123 1 2 3 4
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.