简体   繁体   中英

How to get latest tweet id, using python-twitter search API

I'm trying to find a way to NOT get the same tweets using search API. That's what I'm doing:

  1. make a request to the Twitter
  2. Store Tweets
  3. make another request to the Twitter
  4. Store Tweets,
  5. Compare results from 2 and 4

Ideally in step 5 I would get 0, meaning that no overlapping tweets where received. So I'm not asking Twitter server for the same information more than once.

But I think I got stuck in step 3, where I have to make another call. I'm trying to use 'since_id' argument to get tweets after some certain points. But I'm not sure If the value that I'm using is correct.

Code:

import twitter

class Test():

    def __init__(self):
        self.t_auth()
        self.hashtag = ['justinbieber']

        self.tweets_1 = []
        self.ids_1 = []
        self.created_at_1 = []
        self.tweet_text_1 = []
        self.last_id_1 = ''
        self.page_1 = 1

        self.tweets_2 = []
        self.ids_2 = []
        self.created_at_2 = []
        self.tweet_text_2 = []
        self.last_id_2 = ''
        self.page_2 = 1


        for i in range(1,16):
            self.tweets_1.extend(self.api.GetSearch(self.hashtag, per_page=100, since_id=self.last_id_1, page=self.page_1))
            self.page_1 += 1;
        print len(self.tweets_1)
        for t in self.tweets_1:
           self.ids_1.insert(0,t.id)
           self.created_at_1.insert(0,t.created_at)
           self.tweet_text_1.insert(0,t.text)
           self.last_id_1 = t.id               

        self.last_id_2 = self.last_id_1

        for i in range(1,16):
            self.tweets_2.extend(self.api.GetSearch(self.hashtag, per_page=100, since_id=self.last_id_2, page=self.page_2))
            self.page_2 += 1;
        print len(self.tweets_2)
        for t in self.tweets_2:
           self.ids_2.insert(0,t.id)
           self.created_at_2.insert(0,t.created_at)
           self.tweet_text_2.insert(0,t.text)
           self.last_id_2 = t.id

        print 'Total number of tweets in test 1: ', len(self.tweets_1)
        print 'Last id of test 1: ', self.last_id_1

        print 'Total number of tweets in test 2: ', len(self.tweets_2)
        print 'Last id of test 2: ', self.last_id_2

        print '##################################'
        print '#############OVERLAPING###########'

        ids_overlap = set(self.ids_1).intersection(self.ids_2)
        tweets_text_overlap = set(self.tweet_text_1).intersection(self.tweet_text_2)
        created_at_overlap = set(self.created_at_1).intersection(self.created_at_2)

        print 'Ids: ', len(ids_overlap)
        print 'Text: ', len(tweets_text_overlap)
        print 'Created_at: ', len(created_at_overlap)

        print ids_overlap
        print tweets_text_overlap
        print created_at_overlap



    def t_auth(self):
        consumer_key="xxx"
        consumer_secret="xxx"
        access_key = "xxx"
        access_secret = "xxx"

        self.api = twitter.Api(consumer_key, consumer_secret ,access_key, access_secret)
        self.api.VerifyCredentials()

        return self.api

if __name__ == "__main__":
    Test()  

In addition to 'since_id', you can use 'max_id'. From the Twitter API documentation :

Iterating in a result set: parameters such count, until, since_id, max_id allow to control how we iterate through search results, since it could be a large set of tweets.

By setting these values dynamically, you can restrict your search results to not overlap. For example, max_id is set at 1100 and since_id is set at 1000, and then you will have tweets with IDs between those two values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM