簡體   English   中英

從具有嵌套列表的字典列表創建 DataFrame

[英]Creating a DataFrame from a List of Dictionaries which have nested lists for their values

我找不到與我的問題類似的答案,所以在這里我們 go:我已經爬取了一個網站並將數據存儲在 CSV 文件中,格式如下:

data = [{"has_media": "false", "tags": ["e", "x", "y", "s", "f", "f"], "img_urls": [], "is_replied": "false", "is_reply_to": "false", "likes": 0, "links": [], "parent_id": "", "replies": 0, "reply_to_users": [], "comments": 1, "screen_name": "name", "text": "This is plain text retrieved from the webpage and a link http://ht.ly/25DQR and a hashtag #ecar", "text_html": "<p class=\"><s>#</s><b>electric</b></a></p>", "timestamp": "2010-08-01T09:37:20", "timestamp_epochs": 3749793792, "_id": "28829932", "_url": "/user/status/17479367564", "user_id": "1038384584", "username": "username", "video_url": ""}, {"has_media": "false", "hashtags": ["e", "o", "y", "p", "r", "s"], "img_urls": [], "is_replied": "false", "is_reply_to": "false", "likes": 0, "links": [], "parent_id": "", "replies": 0, "reply_to_users": [], "comments": 1, "screen_name": "user", "text": "New cooperative project: This is plain text retrieved from the webpage and a link http://ht.ly/25DQR and a hashtag #hello", "text_html": "<p class=\</b></a></p>", "timestamp": "2011-05-01T09:50:11", "timestamp_epochs": 18734839, "_id": "2982892", "_url": "/user/status/83982893", "user_id": "29983882", "username": "user", "video_url": ""}]

我想從中創建一個數據框,其中每個字典都是一個新行每個鍵代表一個列

我已經嘗試過了,但是它只給了我一列並且似乎無法識別字典。

import pandas as pd
file = data.split(',')
pd.DataFrame.from_dict(file)

非常感謝幫助,因為我是編碼新手,可能需要一些幫助才能做得更好。

謝謝!

這很簡單。 您可以使用:

import pandas as pd

data = [{"has_media": "false", "tags": ["e", "x", "y", "s", "f", "f"], "img_urls": [], "is_replied": "false", "is_reply_to": "false", "likes": 0, "links": [], "parent_id": "", "replies": 0, "reply_to_users": [], "comments": 1, "screen_name": "name", "text": "This is plain text retrieved from the webpage and a link http://ht.ly/25DQR and a hashtag #ecar", "text_html": "<p class=\"><s>#</s><b>electric</b></a></p>", "timestamp": "2010-08-01T09:37:20", "timestamp_epochs": 3749793792, "_id": "28829932", "_url": "/user/status/17479367564", "user_id": "1038384584", "username": "username", "video_url": ""}, {"has_media": "false", "hashtags": ["e", "o", "y", "p", "r", "s"], "img_urls": [], "is_replied": "false", "is_reply_to": "false", "likes": 0, "links": [], "parent_id": "", "replies": 0, "reply_to_users": [], "comments": 1, "screen_name": "user", "text": "New cooperative project: This is plain text retrieved from the webpage and a link http://ht.ly/25DQR and a hashtag #hello", "text_html": "<p class=\</b></a></p>", "timestamp": "2011-05-01T09:50:11", "timestamp_epochs": 18734839, "_id": "2982892", "_url": "/user/status/83982893", "user_id": "29983882", "username": "user", "video_url": ""}]

df = pd.DataFrame(data)

順便說一句,您還可以使用以下命令從您的文件中下載它:

df = pd.read_json(filename, orient='records')

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM