简体   繁体   English

通过tweepy获取推文

[英]Getting tweets by date with tweepy

I pulled the max amount of tweets allowed from USATODAY which was 3000. 我从USATODAY获得了最大量的推文,这是3000。

Now I want to create a script to automatically pull USATODAY's tweets at 11:59PM of every day. 现在我想创建一个脚本,以便在每天晚上11:59自动提取USATODAY的推文。

I was going to use the stream api but then I will have to keep it running the whole day. 我打算使用流api然后我将不得不让它保持运行一整天。

Can I get some insight on how to create a script where it runs the REST API every night at 11:59PM to pull the day's tweets? 我是否可以了解如何创建一个脚本,每天晚上11:59运行REST API以获取当天的推文? If not does anyone know how to pull tweets based on date? 如果没有,是否有人知道如何根据日期提取推​​文?

I was thinking about placing an ifelse statement in my for loop but that seems inefficient, because it will have to search through 3000 tweets every night. 我正在考虑在我的for循环中放置一个ifelse语句,但这似乎效率低下,因为它必须每晚搜索3000条推文。

This is what I have now: 这就是我现在拥有的:

client = MongoClient('localhost', 27017)
db = client['twitter_db']
collection = db['usa_collection']
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token_key, access_token_secret)

api = tweepy.API(auth)

for tweet in tweepy.Cursor(api.user_timeline,id='USATODAY').items():
    collection.insert(tweet._json)

You can simply retrieve the tweets with the help of pages, Now on each page received you iterate over the tweets and extract the creation time of that tweet which is accessed using tweet.created_at and the you find the difference between the extracted date and the current date, if the difference is less than 1 day then it is a favourable tweet else you just exit out of the loop. 您可以在页面的帮助下简单地检索推文,现在在每个页面上接收到您对推文进行迭代并提取使用tweet.created_at访问的推文的创建时间,您可以找到提取日期和当前之间的差异日期,如果差异小于1天,那么它是一个有利的推文,否则你只是退出循环。

import tweepy, datetime, time

def get_tweets(api, username):
    page = 1
    deadend = False
    while True:
        tweets = api.user_timeline(username, page = page)

        for tweet in tweets:
            if (datetime.datetime.now() - tweet.created_at).days < 1:
                #Do processing here:

                print tweet.text.encode("utf-8")
            else:
                deadend = True
                return
        if not deadend:
            page+=1
            time.sleep(500)

get_tweets(api, "anmoluppal366")

Note: you are not accessing all 3000 tweets of that person, you only iterate over those tweets which were created within the span of 24 hours at the time of launching your application . 注意:您没有访问该人的所有3000条推文,您只会迭代在启动应用程序时 24小时内创建的推文。

Other method: 其他方法:

def search(target, date, maxnum = 10):
    cursor = tweepy.Cursor(
        api.search,
        q = target,
        since = date[0],
        until = date[1],
        show_user = True)

    return cursor.items(maxnum)

if __name__ == '__main__':
    list_tweets = search(
    target = '서지수',
    date = ('2016-05-01', '2016-05-25'),
    maxnum = 100)
    print(list_tweets)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM