简体   繁体   中英

Simple function to respect Twitter's V2 API rate limits?

Problem:

Often we'd like to pull much more data than Twitter would like us to at one time. In between each query it would be wonderful if there was a simple function to call that checks if you need to wait.

Question:

What is a simple function for respecting Twitter's API limits and ensuring that any long-running-query will complete successfully without harassing Twitter and ensure the querying user does not get banned?

Ideal Answer:

The most ideal answer would be a portable function that should work in all situations. That is, finish (properly) no matter what, and respect Twitter's API rate limit rules.

Caveat

I have submitted a working answer of my own but I am unsure if there is a way to improve it.

I am developing a Python package to utilize Twitter's new V2 API . I want to make sure that I am respecting Twitter's rate limits as best as I possibly can.

Below are the two functions used to wait when needed. They check the API call response headers for remaining queries and then also rely on Twitter's HTTP codes provided here as an ultimate backup. As far as I can tell, these three HTTP codes are the only time-related errors, and the others should raise issues for an API user to inform them of whatever they are doing incorrectly.

from datetime import datetime
from osometweet.utils import pause_until

def manage_rate_limits(response):
    """Manage Twitter V2 Rate Limits
    
    This method takes in a `requests` response object after querying
    Twitter and uses the headers["x-rate-limit-remaining"] and
    headers["x-rate-limit-reset"] headers objects to manage Twitter's
    most common, time-dependent HTTP errors.

    """
    while True:

        # Get number of requests left with our tokens
        remaining_requests = int(response.headers["x-rate-limit-remaining"])

        # If that number is one, we get the reset-time
        #   and wait until then, plus 15 seconds.
        # The regular 429 exception is caught below as well,
        #   however, we want to program defensively, where possible.
        if remaining_requests == 1:
            buffer_wait_time = 15
            resume_time = datetime.fromtimestamp( int(response.headers["x-rate-limit-reset"]) + buffer_wait_time )
            print(f"One request from being rate limited. Waiting on Twitter.\n\tResume Time: {resume_time}")
            pause_until(resume_time)

        # Explicitly checking for time dependent errors.
        # Most of these errors can be solved simply by waiting
        # a little while and pinging Twitter again - so that's what we do.
        if response.status_code != 200:

            # Too many requests error
            if response.status_code == 429:
                buffer_wait_time = 15
                resume_time = datetime.fromtimestamp( int(response.headers["x-rate-limit-reset"]) + buffer_wait_time )
                print(f"Too many requests. Waiting on Twitter.\n\tResume Time: {resume_time}")
                pause_until(resume_time)

            # Twitter internal server error
            elif response.status_code == 500:
                # Twitter needs a break, so we wait 30 seconds
                resume_time = datetime.now().timestamp() + 30
                print(f"Internal server error @ Twitter. Giving Twitter a break...\n\tResume Time: {resume_time}")
                pause_until(resume_time)

            # Twitter service unavailable error
            elif response.status_code == 503:
                # Twitter needs a break, so we wait 30 seconds
                resume_time = datetime.now().timestamp() + 30
                print(f"Twitter service unavailable. Giving Twitter a break...\n\tResume Time: {resume_time}")
                pause_until(resume_time)

            # If we get this far, we've done something wrong and should exit
            raise Exception(
                "Request returned an error: {} {}".format(
                    response.status_code, response.text
                )
            )

        # Each time we get a 200 response, exit the function and return the response object
        if response.ok:
            return response

Here is the pause_until function.

def pause_until(time):
    """ Pause your program until a specific end time. 'time' is either
    a valid datetime object or unix timestamp in seconds (i.e. seconds
    since Unix epoch) """
    end = time

    # Convert datetime to unix timestamp and adjust for locality
    if isinstance(time, datetime):
        # If we're on Python 3 and the user specified a timezone,
        # convert to UTC and get tje timestamp.
        if sys.version_info[0] >= 3 and time.tzinfo:
            end = time.astimezone(timezone.utc).timestamp()
        else:
            zoneDiff = pytime.time() - (datetime.now() - datetime(1970, 1, 1)).total_seconds()
            end = (time - datetime(1970, 1, 1)).total_seconds() + zoneDiff

    # Type check
    if not isinstance(end, (int, float)):
        raise Exception('The time parameter is not a number or datetime object')

    # Now we wait
    while True:
        now = pytime.time()
        diff = end - now

        #
        # Time is up!
        #
        if diff <= 0:
            break
        else:
            # 'logarithmic' sleeping to minimize loop iterations
            sleep(diff / 2)

This seems to work quite nicely but I'm not sure if there are edge-cases that will break this or if there is simply a more elegant/simple way to do this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM