[英]Loading a list of dict into a dataframe
I have a list of dictionary like this, wanted to load this into a data frame for a couple of Keys in the object.我有一个这样的字典列表,想将它加载到对象中几个键的数据框中。
The data frame I would like looks like我想要的数据框看起来像
ID -- retweet_count -- favorite_count ID -- retweet_count -- favorite_count
tweet_list = ['{"created_at": "Tue Aug 01 00:17:27 +0000 2017", "id": 892177421306343426, "id_str": "892177421306343426", "full_text": "This is Tilly. She\'s just checking pup on you.", "truncated": false, "display_text_range": [0, 138], "contributors": null, "is_quote_status": false, "retweet_count": 6514, "favorite_count": 33819, "favorited": false, "retweeted": false, "possibly_sensitive": false, "possibly_sensitive_appealable": false, "lang": "en"}',
'{"created_at": "Sun Jul 30 15:58:51 +0000 2017", "id": 891689557279858688, "id_str": "891689557279858688", "full_text": "This is Darla. She commenced a snooze mid meal.", "truncated": false, "display_text_range": [0, 79], "entities": {"hashtags": [], "symbols": [], "following": true, "follow_request_sent": false, "notifications": false, "translator_type": "none"}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "retweet_count": 8964, "favorite_count": 42908, "favorited": false, "retweeted": false, "possibly_sensitive": false, "possibly_sensitive_appealable": false, "lang": "en"}']
You have actually list of str
s, which were created by serializing dict
s using JSON (not false
rather than False
and null
rather than None
).您实际上有
str
的列表,这些列表是通过使用JSON序列化dict
创建的(不是false
而不是False
和null
而不是None
)。 Apply json.loads
at them and then create DataFrame, consider following simple example在它们上应用
json.loads
然后创建 DataFrame,考虑以下简单示例
import json
import pandas as pd
data = ['{"A":1,"B":null}','{"A":null,"B":2}','{"A":null,"B":null}']
df = pd.DataFrame(map(json.loads,data))
print(df)
gives output给出输出
A B
0 1.0 NaN
1 NaN 2.0
2 NaN NaN
Explanation: I use map
built-in function to apply json.loads
to each element of list
and then create pandas.DataFrame
from them.说明:我使用
map
内置函数将json.loads
应用于list
的每个元素,然后从它们创建pandas.DataFrame
。
You need to have a reporoducible data first:您首先需要有一个可重现的数据:
new_list = [
{"created_at": "Tue Aug 01 00:17:27 +0000 2017",
"id": 892177421306343426,
"id_str": "892177421306343426",
"full_text": "This is Tilly. She\'s just checking pup on you.",
"truncated": False,
"display_text_range": [0, 138],
"contributors": None,
"is_quote_status": False,
"retweet_count": 6514,
"favorite_count": 33819,
"favorited": False,
"retweeted": False,
"possibly_sensitive": False,
"possibly_sensitive_appealable": False,
"lang": "en"},
{"created_at": "Sun Jul 30 15:58:51 +0000 2017",
"id": 891689557279858688,
"id_str": "891689557279858688",
"full_text": "This is Darla. She commenced a snooze mid meal.",
"truncated": False,
"display_text_range": [0, 79],
"entities": {"hashtags": [], "symbols": [], "following": True,
"follow_request_sent": False, "notifications": False,
"translator_type": "none"},
"geo": None, "coordinates": None,
"place": None,
"contributors": None,
"is_quote_status": False,
"retweet_count": 8964,
"favorite_count": 42908,
"favorited": False,
"retweeted": False,
"possibly_sensitive": False,
"possibly_sensitive_appealable": False,
"lang": "en"}]
To clean it you can use :要清洁它,您可以使用:
import json
new_list=[]
for i in range(len(tweet_list)):
new_list.append(json.loads(tweet_list[i]))
Then you can use :然后你可以使用:
import pandas as pd
df = pd. DataFrame. from_dict(new_list)
df2=pd.DataFrame(data=df[['id','retweet_count','favorite_count']])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.