如何从字典列表中提取数据到熊猫数据框中？

Question

This is part of the json file I have got as an output after running running a python script using the telethon API.这是我在使用 Telethon API 运行 python 脚本后作为输出获得的 json 文件的一部分。

[{"_": "Message", "id": 4589, "to_id": {"_": "PeerChannel", "channel_id": 1399858792}, "date": "2020-09-03T14:51:03+00:00", "message": "Looking for product managers / engineers who have worked in search engine / query understanding space. Please PM me if you can connect me to someone for the same", "out": false, "mentioned": false, "media_unread": false, "silent": false, "post": false, "from_scheduled": false, "legacy": false, "edit_hide": false, "from_id": 356886523, "fwd_from": null, "via_bot_id": null, "reply_to_msg_id": null, "media": null, "reply_markup": null, "entities": [], "views": null, "edit_date": null, "post_author": null, "grouped_id": null, "restriction_reason": []}, {"_": "MessageService", "id": 4588, "to_id": {"_": "PeerChannel", "channel_id": 1399858792}, "date": "2020-09-03T11:48:18+00:00", "action": {"_": "MessageActionChatJoinedByLink", "inviter_id": 310378430}, "out": false, "mentioned": false, "media_unread": false, "silent": false, "post": false, "legacy": false, "from_id": 1264437394, "reply_to_msg_id": null}

As you can see, the python script has scraped the chats from a particular channel in telegram.如您所见，python 脚本从电报中的特定频道抓取了聊天记录。 All I need is to store the date and message section of the json into a separate dataframe so that I can apply appropriate filters and give a proper output.我所需要的只是将 json 的日期和消息部分存储到一个单独的数据帧中，以便我可以应用适当的过滤器并提供适当的输出。 Can anyone help me with this?谁能帮我这个？

Answer 1

I think you should use json loads then json_normalize to convert json to dataframe with max_level for nested dictionary.我认为您应该使用 json 加载，然后使用 json_normalize 将 json 转换为带有 max_level 的数据帧，用于嵌套字典。

from pandas import json_normalize
import json
d = '[{"_": "Message", "id": 4589, "to_id": {"_": "PeerChannel", "channel_id": 1399858792}, "date": "2020-09-03T14:51:03+00:00", "message": "Looking for product managers / engineers who have worked in search engine / query understanding space. Please PM me if you can connect me to someone for the same", "out": false, "mentioned": false, "media_unread": false, "silent": false, "post": false, "from_scheduled": false, "legacy": false, "edit_hide": false, "from_id": 356886523, "fwd_from": null, "via_bot_id": null, "reply_to_msg_id": null, "media": null, "reply_markup": null, "entities": [], "views": null, "edit_date": null, "post_author": null, "grouped_id": null, "restriction_reason": []}, {"_": "MessageService", "id": 4588, "to_id": {"_": "PeerChannel", "channel_id": 1399858792}, "date": "2020-09-03T11:48:18+00:00", "action": {"_": "MessageActionChatJoinedByLink", "inviter_id": 310378430}, "out": false, "mentioned": false, "media_unread": false, "silent": false, "post": false, "legacy": false, "from_id": 1264437394, "reply_to_msg_id": null}]'
f = json.loads(d)
print(json_normalize(f, max_level=2))

Answer 2

This assumes the object returned from the API is not a string (eg '[{...}, {...}]' .这假设从 API 返回的对象不是字符串（例如'[{...}, {...}]' 。
- If it is a string, use data = json.loads(data) , first.如果是字符串，首先使用data = json.loads(data) 。
The 'date' and corresponding 'message' can be extracted from the list of dicts with a list-comprehension.可以使用列表理解从dicts list中提取'date'和相应的'message' 。
Iterate through each dict in the list , and use dict.get for the key .遍历list每个dict ，并使用dict.get作为key 。 If the key doesn't exist, None is returned.如果键不存在，则返回None 。

import pandas as pd

# where data is the list of dicts, unpack the desired keys and load into pandas
df = pd.DataFrame([{'date': i.get('date'), 'message': i.get('message')} for i in data])

# display(df)
                        date                                                                                                                                                            message
0  2020-09-03T14:51:03+00:00  Looking for product managers / engineers who have worked in search engine / query understanding space. Please PM me if you can connect me to someone for the same
1  2020-09-03T11:48:18+00:00                                                                                                                                                               None

Alternatively或者

If you wish to skip data, where 'message' is None如果您想跳过数据，其中'message'为None

df = pd.DataFrame([{'date': i['date'], 'message': i['message']} for i in data if i.get('message')])

                      date                                                                                                                                                            message
 2020-09-03T14:51:03+00:00  Looking for product managers / engineers who have worked in search engine / query understanding space. Please PM me if you can connect me to someone for the same

如何从字典列表中提取数据到熊猫数据框中？

问题描述

2 个解决方案

解决方案1
1 2020-09-15 19:52:50

解决方案2
1 已采纳 2020-09-15 20:12:24

Alternatively或者

如何从字典列表中提取数据到熊猫数据框中？

问题描述

2 个解决方案

解决方案1 1 2020-09-15 19:52:50

解决方案2 1 已采纳 2020-09-15 20:12:24

Alternatively或者

解决方案1
1 2020-09-15 19:52:50

解决方案2
1 已采纳 2020-09-15 20:12:24