简体   繁体   中英

Tweepy returns inconsistent and not complete results for realDonaldTrump

import tweepy
import csv
import json
import nltk
import re



def scrub_text(string):
    nltk.download('words')
    words = set(nltk.corpus.words.words())

    string=re.sub(r'[^a-zA-Z]+', ' ', string).lower()
    string=" ".join(w for w in nltk.wordpunct_tokenize(string)
                if w.lower() in words or not w.isalpha())
    return string


def get_all_tweets():
    with open('twitter_credentials.json') as cred_data:
        info=json.load(cred_data)
        consumer_key=info['API_KEY']
        consumer_secret=info['API_SECRET']
        access_key=info['ACCESS_TOKEN']
        access_secret=info['ACCESS_SECRET']

    screen_name = input("Enter twitter Handle: ")

    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    api=tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True,
                   timeout=500000, retry_count=10, retry_delay=100)

    all_the_tweets=[]

    new_tweets=api.user_timeline(screen_name=screen_name, count=200)

    all_the_tweets.extend(new_tweets)

    oldest_tweet=all_the_tweets[-1].id - 1

    while len(new_tweets) > 0:
        new_tweets=api.user_timeline(screen_name=screen_name, count=200,
                                     max_id=oldest_tweet)
        all_the_tweets.extend(new_tweets)
        oldest_tweet=all_the_tweets[-1].id -1

        print('...%s tweets downloaded' %len(all_the_tweets))

    outtweets=[[tweet.text.encode('utf-8')] for tweet in all_the_tweets]
    outtweets=scrub_text(str(outtweets))

    with open('tweets.txt', 'w') as f:
        f.write(outtweets)
        f.close()

The above python code should download all the tweets from a particular user. It seems to work for most handles, but when I use it for @realDonaldTrump I sometimes get 800, sometimes I get 1. I never get even close to all of the tweets. I am assuming that there is a problem due to how active the account is, but I think there should be a way to get around this.

The Twitter timelines API only supports a maximum of 3200 Tweets ( source ), and this may also depend on age of the Tweet / how far back in time you are paging. Unfortunately, you will not be able to use the API to get all of these Tweets. You would need to use the commercial Full Archive search API to retrieve all of the Tweets from the account.

Regarding the inconsistent number of results, that sounds like a glitch, as it shouldn't vary by that much.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM