简体   繁体   中英

Collecting tweets using screen names and using Tweepy

I have a list of Twitter screen names(one hundred) and want to collect 3200 tweets per screen name. But I can only collect 3200 tweets in total using code as below because It reached limit of collecting tweets If I tried to input 100 screen names.. .. Can anyone have suggestion to collect 3200 tweets per screen name? It would be really appreciated if you can share some advice! Thank you in advance!

import tweepy
import csv

def get_all_tweets(screen_name):


    consumer_key = ****
    consumer_secret = ****
    access_key = ****
    access_secret = ****

    #authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth, wait_on_rate_limit=True)

    #initialize a list to hold all the tweepy Tweets & list with no retweets
    alltweets = []
    noRT = []

    #make initial request for most recent tweets with extended mode enabled to get full tweets
    new_tweets = api.user_timeline(screen_name = screen_name, tweet_mode = 'extended', count=200, include_retweets=False)

    #save most recent tweets
    alltweets.extend(new_tweets)

    #save the id of the oldest tweet less one
    oldest = alltweets[-1].id - 1

    #keep grabbing tweets until the api limit is reached
    while len(alltweets) <= 3200:
        print("getting tweets before {}".format(oldest))

        #all subsiquent requests use the max_id param to prevent duplicates
        new_tweets = api.user_timeline(screen_name = screen_name,tweet_mode = 'extended', count=200,max_id=oldest, include_retweets=False)

        #save most recent tweets
        alltweets.extend(new_tweets)

        #update the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1

        print("...{} tweets downloaded so far".format(len(alltweets)))

        #removes retweets
    for tweet in alltweets:
        if 'RT' in tweet.full_text:
            continue
        else:
            noRT.append([tweet.id_str, tweet.created_at, tweet.full_text, ])

    #write to csv
    with open('{}_tweets.csv'.format(screen_name), 'w') as f:
        writer = csv.writer(f)
        writer.writerow(["id","created_at","text"])
        writer.writerows(noRT)
        print('{}_tweets.csv was successfully created.'.format(screen_name))
    pass



if __name__ == '__main__':
        #pass in the username of the account you want to download. I have hundred username in the list
        usernames = ["JLo", "ABC", 'Trump']
        for x in usernames:
                  get_all_tweets(x)

First of all, in order to iterate through timelines you must use pagination. I recommend you to use Cursor in tweepy because it's much easier than dealing with max_id and so on.

for page in tweepy.Cursor(api.user_timeline,
    screen_name = screen_name,
    tweet_mode="extended",
    include_retweets=False,
    count=100).pages(num_pages = 32):
    for status in page:
        # do your process on status

Secondly, there is indeed a rate limit which you can find here, so getting a warning that you reached the limit is not something unusual: https://developer.twitter.com/en/docs/twitter-api/v1/tweets/timelines/faq

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM