简体   繁体   English

有什么方法可以加快使用tweepy下载推文的python代码吗?

[英]Is there any way to speed-up python code for downloading tweets using tweepy?

Here is the code that i am using for the purpose .For each user request it's taking too long time to download all the tweets.What are some ways to speed up the execution time.The idea is to use tweet analytics in real time as the user visits the website.I am new to python so any help would be appreciated . 这是我用于此目的的代码。对于每个用户请求来说,下载所有tweet花费的时间都太长了。有几种方法可以加快执行时间。想法是实时使用tweet分析作为用户访问网站。我是python的新手,所以任何帮助都将受到赞赏。

import tweepy #https://github.com/tweepy/tweepy


#Twitter API credentials
consumer_key = ".."
consumer_secret = ".."
access_key = ".."
access_secret = ".."


def get_all_tweets(screen_name):
    #Twitter only allows access to a users most recent 3240 tweets with this method

    #authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)

    #initialize a list to hold all the tweepy Tweets
    alltweets = []  

    #make initial request for most recent tweets (200 is the maximum allowed count)
    new_tweets = api.user_timeline(screen_name = screen_name,count=200)

    #save most recent tweets
    alltweets.extend(new_tweets)

    #save the id of the oldest tweet less one
    oldest = alltweets[-1].id - 1

    #keep grabbing tweets until there are no tweets left to grab
    while len(new_tweets) > 0:
        print ("getting tweets before %s".format(oldest))

        #all subsiquent requests use the max_id param to prevent duplicates
        new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)

        #save most recent tweets
        alltweets.extend(new_tweets)

        #update the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1

        print ("...%s tweets downloaded so far".format(len(alltweets)))

    #transform the tweepy tweets into a 2D array that will populate the csv 
    outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
    return outtweets

One way to make your solution faster would be to make some cache. 使解决方案更快的一种方法是建立一些缓存。

When you've downloaded all the tweets for a screen name, save them locally, for instance as [twitter_screen_name].json 下载所有屏幕名称的推文后,请将其保存在本地,例如另存为[twitter_screen_name] .json

Then edit your function to check for your cache files. 然后编辑您的功能以检查您的缓存文件。 If it doesn't exist, create it empty. 如果不存在,请将其创建为空。 Then load it, refresh only what needs to, and save back your json cache file. 然后加载它,仅刷新需要的内容,然后保存回json缓存文件。

This way, when a user visits, you'll download only the diff with twitter. 这样,当用户访问时,您将仅下载带有Twitter的差异 This will be much faster for the regularly consulted screen names. 对于常规查询的屏幕名称,这将更快。

Then you could add something for auto clearing the cache - a simple CRON that removes files with last-accessed META older than n days for instance. 然后,您可以添加一些用于自动清除缓存的内容-例如,一个简单的CRON可以删除最后访问的META早于n天的文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM