简体   繁体   中英

How to stop Tweepy once X amount of tweets have been stored in CSV?

I have been learning Python for about a month now and after watching several tutorials, I decided to give Tweepy a go, to analyze the extracted data. The thing though, is that Tweepy will not stop streaming tweets, no matter where I place the if statement. I am using Python 3.9 and Tweepy 3.10.

For instance, I would like it to stop once 10 tweets have been stored in the CSV.

Any thoughts?

 # Stream Listener Class
class MyListener(tweepy.StreamListener):
    def on_data(self, raw_data):
        self.num_tweets = 0
        self.file_name = 'path/tweet_stream.csv'
        self.process_data(raw_data)
        return True

    def process_data(self, raw_data):
        print(raw_data)
        with open(self.file_name, mode='a') as f:
            writer = csv.writer(f)
            writer.writerow([raw_data, '\n'])
        self.num_tweets += 1
        if self.num_tweets < 10:
            return True
        else:
            return False


# Creating the Stream
class MyStream():
    def __init__(self, auth, listener):
        self.stream = tweepy.Stream(auth=auth, listener=listener)

    def start(self, keywords):
        self.stream.filter(track=keywords)

# Starting
if __name__ == "__main__":
    listener = MyListener()

I think every time on_data is getting called its resetting the num to 0, so if you would take it out of the on_data func it might resolve your issue.

class MyListener(tweepy.StreamListener):
def __init__(self):
    self.num_tweets = 0
    self.file_name = 'path/tweet_stream.csv'
    
def on_data(self, raw_data):    
    self.process_data(raw_data)
    return True

def process_data(self, raw_data):
    print(raw_data)
    with open(self.file_name, mode='a') as f:
        writer = csv.writer(f)
        writer.writerow([raw_data, '\n'])
    self.num_tweets += 1
    if self.num_tweets < 10:
        return True
    else:
        return False

After quite some time messing with my code, I figured a workaround. Ended up replacing the process_data with the on_data function (as it is more oriented towards what I am after anyway).

The workaround itself is keeping the constructor, creating an empty list for the tweets and writing the status json to the self.file, while appending the list. The append method is called right before the self.num_tweets counter.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM