使用屏幕名称和使用 Tweepy 收集推文

Question

我有一个 Twitter 网名列表（一百个），并希望每个网名收集 3200 条推文。 但是我只能使用下面的代码总共收集 3200 条推文，因为它达到了收集推文的限制如果我尝试输入 100 个屏幕名称.. .. 任何人都可以建议每个屏幕名称收集 3200 条推文吗？ 如果您能分享一些建议，将不胜感激！ 先感谢您！

import tweepy
import csv

def get_all_tweets(screen_name):


    consumer_key = ****
    consumer_secret = ****
    access_key = ****
    access_secret = ****

    #authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth, wait_on_rate_limit=True)

    #initialize a list to hold all the tweepy Tweets & list with no retweets
    alltweets = []
    noRT = []

    #make initial request for most recent tweets with extended mode enabled to get full tweets
    new_tweets = api.user_timeline(screen_name = screen_name, tweet_mode = 'extended', count=200, include_retweets=False)

    #save most recent tweets
    alltweets.extend(new_tweets)

    #save the id of the oldest tweet less one
    oldest = alltweets[-1].id - 1

    #keep grabbing tweets until the api limit is reached
    while len(alltweets) <= 3200:
        print("getting tweets before {}".format(oldest))

        #all subsiquent requests use the max_id param to prevent duplicates
        new_tweets = api.user_timeline(screen_name = screen_name,tweet_mode = 'extended', count=200,max_id=oldest, include_retweets=False)

        #save most recent tweets
        alltweets.extend(new_tweets)

        #update the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1

        print("...{} tweets downloaded so far".format(len(alltweets)))

        #removes retweets
    for tweet in alltweets:
        if 'RT' in tweet.full_text:
            continue
        else:
            noRT.append([tweet.id_str, tweet.created_at, tweet.full_text, ])

    #write to csv
    with open('{}_tweets.csv'.format(screen_name), 'w') as f:
        writer = csv.writer(f)
        writer.writerow(["id","created_at","text"])
        writer.writerows(noRT)
        print('{}_tweets.csv was successfully created.'.format(screen_name))
    pass



if __name__ == '__main__':
        #pass in the username of the account you want to download. I have hundred username in the list
        usernames = ["JLo", "ABC", 'Trump']
        for x in usernames:
                  get_all_tweets(x)

Answer 1

首先，为了遍历时间线，您必须使用分页。 我建议您在 tweepy 中使用Cursor ，因为它比处理 max_id 等要容易得多。

for page in tweepy.Cursor(api.user_timeline,
    screen_name = screen_name,
    tweet_mode="extended",
    include_retweets=False,
    count=100).pages(num_pages = 32):
    for status in page:
        # do your process on status

其次，您确实可以在此处找到速率限制，因此收到已达到限制的警告并不罕见： https : //developer.twitter.com/en/docs/twitter-api/v1/tweets/时间表/常见问题

使用屏幕名称和使用 Tweepy 收集推文

问题描述

1 个解决方案

解决方案1
0 2020-09-17 19:32:27

使用屏幕名称和使用 Tweepy 收集推文

问题描述

1 个解决方案

解决方案1 0 2020-09-17 19:32:27

解决方案1
0 2020-09-17 19:32:27