[英]Adding values to python dictionary, only returning last value
Below is a list of twitter handles I am using to scrape tweets下面是我用来抓取推文的 twitter 句柄列表
myDict = {}
list = ['ShoePalace', 'StreetWearDealz', 'ClothesUndrCost', 'DealsPlus', 'bodega', 'FRSHSneaks',
'more_sneakers', 'BOOSTLINKS', 'endclothing', 'DopeKixDaily', 'RSVPGallery', 'StealSupply',
'SneakerAlertHD', 'JustFreshKicks', 'solefed', 'SneakerMash', 'StealsBySwell', 'KicksDeals',
'FatKidDeals', 'sneakersteal', 'SOLELINKS', 'SneakerShouts', 'KicksUnderCost', 'snkr_twitr',
'KicksFinder']
In the for loop below I am cycling thru each twitter handle and grabbing data.在下面的 for 循环中,我循环遍历每个 twitter 句柄并抓取数据。 After the data is pull I am attempting to add the data to the dictionary (myDict).
提取数据后,我试图将数据添加到字典 (myDict) 中。 Currently the code is only returning a single value:
目前代码只返回一个值:
{'title': 'Ad: Nike Air Max 97 Golf 'Grass' is back in stock at Nikestore,\n\n>>': 'url'. {'title': '广告:Nike Air Max 97 Golf 'Grass' 在 Nikestore 重新上架,\n\n>>': 'url'。 'example,com': 'image'.
'例如,com':'图像'。 'image,jpg': 'tweet_url'.
'图片,jpg':'tweet_url'。 'example,com': 'username', 'KicksFinder': 'date'.
'example,com': '用户名', 'KicksFinder': '日期'。 datetime,datetime(2020, 7, 27, 11, 44, 26)}
日期时间,日期时间(2020、7、27、11、44、26)}
for i in list:
for tweet in get_tweets(i, pages=1):
tweet_url = 'https://www.twitter.com/' + tweet['tweetUrl']
username = tweet['username']
date = tweet['time']
text = tweet['text']
title = text.split('http')[0]
title = title.strip()
title = title.rstrip()
try:
entries = tweet['entries']
image = entries["photos"][0]
url = entries["urls"][0]
myDict['title'] = title
myDict['url'] = url
myDict['image'] = image
myDict['tweet_url'] = tweet_url
myDict['username'] = username
myDict['date'] = date
except IndexError:
title = title
image = ""
link = ""
return(myDict)
You're mutating a single dict, not adding to a list.您正在改变单个字典,而不是添加到列表中。
We can refactor your code to a handful of simpler functions that process tweepy?我们可以将您的代码重构为一些处理 tweepy 的更简单的函数吗? Tweets into dicts and others that
yield
processed tweet dicts for a given user. Tweets into dicts 和其他为给定用户
yield
经过处理的 tweet dicts。
Instead of printing the tweets at the end, you could now list.append
them - or even simpler, just tweets = list(process_tweets_for_users(usernames))
:)您现在可以
list.append
它们,而不是在最后打印推文 - 或者更简单,只是tweets = list(process_tweets_for_users(usernames))
:)
def process_tweet(tweet) -> dict:
"""
Turn a Twitter-native Tweet into a dict
"""
tweet_url = "https://www.twitter.com/" + tweet["tweetUrl"]
username = tweet["username"]
date = tweet["time"]
text = tweet["text"]
title = text.split("http")[0]
title = title.strip()
try:
entries = tweet["entries"]
image = entries["photos"][0]
url = entries["urls"][0]
except Exception:
image = url = None
return {
"title": title,
"url": url,
"image": image,
"tweet_url": tweet_url,
"username": username,
"date": date,
}
def process_user_tweets(username: str):
"""
Generate processed tweets for a given user.
"""
for tweet in get_tweets(username, pages=1):
try:
yield process_tweet(tweet)
except Exception as exc:
# TODO: improve error handling
print(exc)
def process_tweets_for_users(usernames):
"""
Generate processed tweets for a number of users.
"""
for username in usernames:
yield from process_user_tweets(username)
usernames = [
"ShoePalace",
"StreetWearDealz",
"ClothesUndrCost",
"DealsPlus",
"bodega",
"FRSHSneaks",
"more_sneakers",
"BOOSTLINKS",
"endclothing",
"DopeKixDaily",
"RSVPGallery",
"StealSupply",
"SneakerAlertHD",
"JustFreshKicks",
"solefed",
"SneakerMash",
"StealsBySwell",
"KicksDeals",
"FatKidDeals",
"sneakersteal",
"SOLELINKS",
"SneakerShouts",
"KicksUnderCost",
"snkr_twitr",
"KicksFinder",
]
for tweet in process_tweets_for_users(usernames):
print(tweet)
It is expected you only get the results for the last value in your lists because you seem to be overwriting the results for each tweet, instead of appending them to a list.预计您只会获得列表中最后一个值的结果,因为您似乎正在覆盖每条推文的结果,而不是将它们附加到列表中。 I would use
defauldict(list)
and then append each tweet:我会使用
defauldict(list)
然后是 append 每条推文:
from collections import defaultdict
myDict = defaultdict(list)
for i in list:
for tweet in get_tweets(i, pages=1):
tweet_url = 'https://www.twitter.com/' + tweet['tweetUrl']
username = tweet['username']
date = tweet['time']
text = tweet['text']
title = text.split('http')[0]
title = title.strip()
title = title.rstrip()
try:
entries = tweet['entries']
image = entries["photos"][0]
url = entries["urls"][0]
myDict['title'].append(title)
myDict['url'].append(url)
myDict['image'].append(image)
myDict['tweet_url'].append(tweet_url)
myDict['username'].append(username)
myDict['date'].append(date)
except IndexError:
title = title
image = ""
link = ""
return(myDict)
Now that you have everything nice and tidy, you can put it into a nice dataframe to work with your data:现在你已经把所有东西都整理好了,你可以把它放到一个漂亮的 dataframe 中来处理你的数据:
tweets_df = pd.DataFrame(tweets_df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.