简体   繁体   中英

Twitter Scraper Rate Limit

I am trying to scrape all the "Following" account information (Username, Website, Last Tweet Date) of a certain account. For example https://www.twitter.com/verified/following . As you may see, it has 365.7K Following usernames.

I scraped the usernames and now I have to go to all the links and scrape that data. The code works fine, it gets all the information needed, but after a certain number of link visits, Twitter says I exceeded the Rate Limit and it stops showing any information about the account I visit.

def get_user_info(user):
    """Gets User Info - Username, Website, Last Tweet Date"""
    driver.get(user[0])
    sleep(1)
    username = '@' + user[0].split('/')[-1]
    attempt = 0
    while True:
        try:
            website = driver.find_element_by_xpath("//div[@data-testid='UserProfileHeader_Items']/a").get_attribute('href')
        except NoSuchElementException:
            website = 'No Website'
            attempt += 1
            sleep(1)
        try:
            last_tweet_date = driver.find_element_by_xpath("//time").get_attribute('datetime')
        except NoSuchElementException:
            last_tweet_date = 'No Tweets'
            attempt += 1
            sleep(1)
        if website != 'No Website' and last_tweet_date != 'No Tweets':
            break
        if attempt > 1:
            break

    info = (username, website, last_tweet_date)
    return info

def user_info():
    info_list = []
    users_df = pd.read_csv('UserLinks.csv')
    user_list = users_df.values.tolist()
    for user in user_list:
        info = get_user_info(user)
        info_list.append(info)

    info_df = pd.DataFrame(info_list, columns=['Username', 'Website', 'Last Tweet Date'])
    info_df.to_csv('List2.csv', index=False)

What do you suggest?

Here's my answer to a similar question on rate limits:

How Rate Limit Works in Twitter

Essentially, every API has a rate limit that renews in a certain timeframe. eg 15 minutes. So, you need to watch the rate limit headers or keep count yourself. When you get to the rate limit, pause your application and start again on the next rate limit window. Some APIs have a count parameter and you'll want to make sure you set that to max to get the most responses per request. Also, Application auth typically gets more requests than User auth, if it's available for a given API call.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM