简体   繁体   English

Tweepy Streaming - 停止以x金额收集推文

[英]Tweepy Streaming - Stop collecting tweets at x amount

I'm looking to have the Tweepy Streaming API stop pulling in tweets after I have stored x # of tweets in MongoDB. 我想在MongoDB中存储x#推文之后,让Tweepy Streaming API停止推文。

I have tried IF and WHILE statements inside the class, defintion with counters, but cannot get it to stop at a certain X amount. 我已经在类中尝试了IF和WHILE语句,使用计数器进行了定义,但是无法让它在某个X量上停止。 This is a real head-banger for me. 对我来说,这真是一个真正的头脑。 I found this link here: https://groups.google.com/forum/#!topic/tweepy/5IGlu2Qiug4 but my efforts to replicate this have failed. 我在这里找到了这个链接: https//groups.google.com/forum/#!topic / tvweepy / 5IGlu2Qiug4但是我复制这个的努力失败了。 It always tells me that init needs an additional argument. 它总是告诉我init需要一个额外的参数。 I believe we have our Tweepy auth set different, so it is not apples to apples. 我相信我们的Tweepy auth设置不同,所以它不是苹果到苹果。

Any thoughts? 有什么想法吗?

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json, time, sys

import tweepy
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(OAUTH_TOKEN, OAUTH_TOKEN_SECRET)

class StdOutListener(StreamListener):

    def on_status(self, status):
        text = status.text
        created = status.created_at
        record = {'Text': text, 'Created At': created}
        print record #See Tweepy documentation to learn how to access other fields
        collection.insert(record)  


    def on_error(self, status):
        print 'Error on status', status

    def on_limit(self, status):
        print 'Limit threshold exceeded', status

    def on_timeout(self, status):
        print 'Stream disconnected; continuing...'


stream = Stream(auth, StdOutListener())
stream.filter(track=['tv'])

You need to add a counter inside of your class in __init__ , and then increment it inside of on_status . 你需要在__init__中的类中添加一个计数器,然后在on_status增加它。 Then when the counter is below 20 it will insert a record into the collection. 然后当计数器低于20时,它会将记录插入集合中。 This could be done as show below: 这可以如下所示:

def __init__(self, api=None):
    super(StdOutListener, self).__init__()
    self.num_tweets = 0

def on_status(self, status):
    record = {'Text': status.text, 'Created At': status.created_at}
    print record #See Tweepy documentation to learn how to access other fields
    self.num_tweets += 1
    if self.num_tweets < 20:
        collection.insert(record)
        return True
    else:
        return False

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM