简体   繁体   English

向 python 字典添加值,只返回最后一个值

[英]Adding values to python dictionary, only returning last value

Below is a list of twitter handles I am using to scrape tweets下面是我用来抓取推文的 twitter 句柄列表

myDict = {}

list = ['ShoePalace', 'StreetWearDealz', 'ClothesUndrCost', 'DealsPlus', 'bodega', 'FRSHSneaks', 
            'more_sneakers', 'BOOSTLINKS', 'endclothing', 'DopeKixDaily', 'RSVPGallery', 'StealSupply',
            'SneakerAlertHD', 'JustFreshKicks', 'solefed', 'SneakerMash', 'StealsBySwell', 'KicksDeals', 
            'FatKidDeals', 'sneakersteal', 'SOLELINKS', 'SneakerShouts', 'KicksUnderCost', 'snkr_twitr',
            'KicksFinder']

In the for loop below I am cycling thru each twitter handle and grabbing data.在下面的 for 循环中,我循环遍历每个 twitter 句柄并抓取数据。 After the data is pull I am attempting to add the data to the dictionary (myDict).提取数据后,我试图将数据添加到字典 (myDict) 中。 Currently the code is only returning a single value:目前代码只返回一个值:

{'title': 'Ad: Nike Air Max 97 Golf 'Grass' is back in stock at Nikestore,\n\n>>': 'url'. {'title': '广告:Nike Air Max 97 Golf 'Grass' 在 Nikestore 重新上架,\n\n>>': 'url'。 'example,com': 'image'. '例如,com':'图像'。 'image,jpg': 'tweet_url'. '图片,jpg':'tweet_url'。 'example,com': 'username', 'KicksFinder': 'date'. 'example,com': '用户名', 'KicksFinder': '日期'。 datetime,datetime(2020, 7, 27, 11, 44, 26)}日期时间,日期时间(2020、7、27、11、44、26)}

for i in list:
    for tweet in get_tweets(i, pages=1):
        tweet_url = 'https://www.twitter.com/' + tweet['tweetUrl']
        username = tweet['username']
        date = tweet['time']
        text = tweet['text']
        title = text.split('http')[0]
        title = title.strip()
        title = title.rstrip()
        try:
            entries = tweet['entries']
            image = entries["photos"][0]
            url = entries["urls"][0]
            myDict['title'] = title
            myDict['url'] = url
            myDict['image'] = image
            myDict['tweet_url'] = tweet_url
            myDict['username'] = username
            myDict['date'] = date
        except IndexError:
            title = title
            image = ""
            link = ""
   
    return(myDict)

You're mutating a single dict, not adding to a list.您正在改变单个字典,而不是添加到列表中。

We can refactor your code to a handful of simpler functions that process tweepy?我们可以将您的代码重构为一些处理 tweepy 的更简单的函数吗? Tweets into dicts and others that yield processed tweet dicts for a given user. Tweets into dicts 和其他为给定用户yield经过处理的 tweet dicts。

Instead of printing the tweets at the end, you could now list.append them - or even simpler, just tweets = list(process_tweets_for_users(usernames)) :)您现在可以list.append它们,而不是在最后打印推文 - 或者更简单,只是tweets = list(process_tweets_for_users(usernames)) :)

def process_tweet(tweet) -> dict:
    """
    Turn a Twitter-native Tweet into a dict
    """
    tweet_url = "https://www.twitter.com/" + tweet["tweetUrl"]
    username = tweet["username"]
    date = tweet["time"]
    text = tweet["text"]
    title = text.split("http")[0]
    title = title.strip()
    try:
        entries = tweet["entries"]
        image = entries["photos"][0]
        url = entries["urls"][0]
    except Exception:
        image = url = None
    return {
        "title": title,
        "url": url,
        "image": image,
        "tweet_url": tweet_url,
        "username": username,
        "date": date,
    }


def process_user_tweets(username: str):
    """
    Generate processed tweets for a given user.
    """
    for tweet in get_tweets(username, pages=1):
        try:
            yield process_tweet(tweet)
        except Exception as exc:
            # TODO: improve error handling
            print(exc)


def process_tweets_for_users(usernames):
    """
    Generate processed tweets for a number of users.
    """
    for username in usernames:
        yield from process_user_tweets(username)


usernames = [
    "ShoePalace",
    "StreetWearDealz",
    "ClothesUndrCost",
    "DealsPlus",
    "bodega",
    "FRSHSneaks",
    "more_sneakers",
    "BOOSTLINKS",
    "endclothing",
    "DopeKixDaily",
    "RSVPGallery",
    "StealSupply",
    "SneakerAlertHD",
    "JustFreshKicks",
    "solefed",
    "SneakerMash",
    "StealsBySwell",
    "KicksDeals",
    "FatKidDeals",
    "sneakersteal",
    "SOLELINKS",
    "SneakerShouts",
    "KicksUnderCost",
    "snkr_twitr",
    "KicksFinder",
]

for tweet in process_tweets_for_users(usernames):
    print(tweet)

It is expected you only get the results for the last value in your lists because you seem to be overwriting the results for each tweet, instead of appending them to a list.预计您只会获得列表中最后一个值的结果,因为您似乎正在覆盖每条推文的结果,而不是将它们附加到列表中。 I would use defauldict(list) and then append each tweet:我会使用defauldict(list)然后是 append 每条推文:

from collections import defaultdict
myDict = defaultdict(list)
for i in list:
    for tweet in get_tweets(i, pages=1):
        tweet_url = 'https://www.twitter.com/' + tweet['tweetUrl']
        username = tweet['username']
        date = tweet['time']
        text = tweet['text']
        title = text.split('http')[0]
        title = title.strip()
        title = title.rstrip()
        try:
            entries = tweet['entries']
            image = entries["photos"][0]
            url = entries["urls"][0]
            myDict['title'].append(title)
            myDict['url'].append(url)
            myDict['image'].append(image)
            myDict['tweet_url'].append(tweet_url)
            myDict['username'].append(username)
            myDict['date'].append(date)
        except IndexError:
            title = title
            image = ""
            link = ""
   
    return(myDict)

Now that you have everything nice and tidy, you can put it into a nice dataframe to work with your data:现在你已经把所有东西都整理好了,你可以把它放到一个漂亮的 dataframe 中来处理你的数据:

tweets_df = pd.DataFrame(tweets_df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM