简体   繁体   English

使用屏幕名称和使用 Tweepy 收集推文

[英]Collecting tweets using screen names and using Tweepy

I have a list of Twitter screen names(one hundred) and want to collect 3200 tweets per screen name.我有一个 Twitter 网名列表(一百个),并希望每个网名收集 3200 条推文。 But I can only collect 3200 tweets in total using code as below because It reached limit of collecting tweets If I tried to input 100 screen names.. .. Can anyone have suggestion to collect 3200 tweets per screen name?但是我只能使用下面的代码总共收集 3200 条推文,因为它达到了收集推文的限制如果我尝试输入 100 个屏幕名称.. .. 任何人都可以建议每个屏幕名称收集 3200 条推文吗? It would be really appreciated if you can share some advice!如果您能分享一些建议,将不胜感激! Thank you in advance!先感谢您!

import tweepy
import csv

def get_all_tweets(screen_name):


    consumer_key = ****
    consumer_secret = ****
    access_key = ****
    access_secret = ****

    #authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth, wait_on_rate_limit=True)

    #initialize a list to hold all the tweepy Tweets & list with no retweets
    alltweets = []
    noRT = []

    #make initial request for most recent tweets with extended mode enabled to get full tweets
    new_tweets = api.user_timeline(screen_name = screen_name, tweet_mode = 'extended', count=200, include_retweets=False)

    #save most recent tweets
    alltweets.extend(new_tweets)

    #save the id of the oldest tweet less one
    oldest = alltweets[-1].id - 1

    #keep grabbing tweets until the api limit is reached
    while len(alltweets) <= 3200:
        print("getting tweets before {}".format(oldest))

        #all subsiquent requests use the max_id param to prevent duplicates
        new_tweets = api.user_timeline(screen_name = screen_name,tweet_mode = 'extended', count=200,max_id=oldest, include_retweets=False)

        #save most recent tweets
        alltweets.extend(new_tweets)

        #update the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1

        print("...{} tweets downloaded so far".format(len(alltweets)))

        #removes retweets
    for tweet in alltweets:
        if 'RT' in tweet.full_text:
            continue
        else:
            noRT.append([tweet.id_str, tweet.created_at, tweet.full_text, ])

    #write to csv
    with open('{}_tweets.csv'.format(screen_name), 'w') as f:
        writer = csv.writer(f)
        writer.writerow(["id","created_at","text"])
        writer.writerows(noRT)
        print('{}_tweets.csv was successfully created.'.format(screen_name))
    pass



if __name__ == '__main__':
        #pass in the username of the account you want to download. I have hundred username in the list
        usernames = ["JLo", "ABC", 'Trump']
        for x in usernames:
                  get_all_tweets(x)

First of all, in order to iterate through timelines you must use pagination.首先,为了遍历时间线,您必须使用分页。 I recommend you to use Cursor in tweepy because it's much easier than dealing with max_id and so on.我建议您在 tweepy 中使用Cursor ,因为它比处理 max_id 等要容易得多。

for page in tweepy.Cursor(api.user_timeline,
    screen_name = screen_name,
    tweet_mode="extended",
    include_retweets=False,
    count=100).pages(num_pages = 32):
    for status in page:
        # do your process on status

Secondly, there is indeed a rate limit which you can find here, so getting a warning that you reached the limit is not something unusual: https://developer.twitter.com/en/docs/twitter-api/v1/tweets/timelines/faq其次,您确实可以在此处找到速率限制,因此收到已达到限制的警告并不罕见: https : //developer.twitter.com/en/docs/twitter-api/v1/tweets/时间表/常见问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM