Often we'd like to pull much more data than Twitter would like us to at one time. In between each query it would be wonderful if there was a simple function to call that checks if you need to wait.
What is a simple function for respecting Twitter's API limits and ensuring that any long-running-query will complete successfully without harassing Twitter and ensure the querying user does not get banned?
The most ideal answer would be a portable function that should work in all situations. That is, finish (properly) no matter what, and respect Twitter's API rate limit rules.
I have submitted a working answer of my own but I am unsure if there is a way to improve it.
I am developing a Python package to utilize Twitter's new V2 API . I want to make sure that I am respecting Twitter's rate limits as best as I possibly can.
Below are the two functions used to wait when needed. They check the API call response headers for remaining queries and then also rely on Twitter's HTTP codes provided here as an ultimate backup. As far as I can tell, these three HTTP codes are the only time-related errors, and the others should raise issues for an API user to inform them of whatever they are doing incorrectly.
from datetime import datetime
from osometweet.utils import pause_until
def manage_rate_limits(response):
"""Manage Twitter V2 Rate Limits
This method takes in a `requests` response object after querying
Twitter and uses the headers["x-rate-limit-remaining"] and
headers["x-rate-limit-reset"] headers objects to manage Twitter's
most common, time-dependent HTTP errors.
"""
while True:
# Get number of requests left with our tokens
remaining_requests = int(response.headers["x-rate-limit-remaining"])
# If that number is one, we get the reset-time
# and wait until then, plus 15 seconds.
# The regular 429 exception is caught below as well,
# however, we want to program defensively, where possible.
if remaining_requests == 1:
buffer_wait_time = 15
resume_time = datetime.fromtimestamp( int(response.headers["x-rate-limit-reset"]) + buffer_wait_time )
print(f"One request from being rate limited. Waiting on Twitter.\n\tResume Time: {resume_time}")
pause_until(resume_time)
# Explicitly checking for time dependent errors.
# Most of these errors can be solved simply by waiting
# a little while and pinging Twitter again - so that's what we do.
if response.status_code != 200:
# Too many requests error
if response.status_code == 429:
buffer_wait_time = 15
resume_time = datetime.fromtimestamp( int(response.headers["x-rate-limit-reset"]) + buffer_wait_time )
print(f"Too many requests. Waiting on Twitter.\n\tResume Time: {resume_time}")
pause_until(resume_time)
# Twitter internal server error
elif response.status_code == 500:
# Twitter needs a break, so we wait 30 seconds
resume_time = datetime.now().timestamp() + 30
print(f"Internal server error @ Twitter. Giving Twitter a break...\n\tResume Time: {resume_time}")
pause_until(resume_time)
# Twitter service unavailable error
elif response.status_code == 503:
# Twitter needs a break, so we wait 30 seconds
resume_time = datetime.now().timestamp() + 30
print(f"Twitter service unavailable. Giving Twitter a break...\n\tResume Time: {resume_time}")
pause_until(resume_time)
# If we get this far, we've done something wrong and should exit
raise Exception(
"Request returned an error: {} {}".format(
response.status_code, response.text
)
)
# Each time we get a 200 response, exit the function and return the response object
if response.ok:
return response
Here is the pause_until
function.
def pause_until(time):
""" Pause your program until a specific end time. 'time' is either
a valid datetime object or unix timestamp in seconds (i.e. seconds
since Unix epoch) """
end = time
# Convert datetime to unix timestamp and adjust for locality
if isinstance(time, datetime):
# If we're on Python 3 and the user specified a timezone,
# convert to UTC and get tje timestamp.
if sys.version_info[0] >= 3 and time.tzinfo:
end = time.astimezone(timezone.utc).timestamp()
else:
zoneDiff = pytime.time() - (datetime.now() - datetime(1970, 1, 1)).total_seconds()
end = (time - datetime(1970, 1, 1)).total_seconds() + zoneDiff
# Type check
if not isinstance(end, (int, float)):
raise Exception('The time parameter is not a number or datetime object')
# Now we wait
while True:
now = pytime.time()
diff = end - now
#
# Time is up!
#
if diff <= 0:
break
else:
# 'logarithmic' sleeping to minimize loop iterations
sleep(diff / 2)
This seems to work quite nicely but I'm not sure if there are edge-cases that will break this or if there is simply a more elegant/simple way to do this.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.