简体   繁体   中英

Using tweepy to get unique tweets

I am trying to get a corpus of Tweets using a number of search terms. One issue I am having is that it is not being able to get unique tweets. That is, retweets.

Is there a way to remove these beforehand without doing any text processing?

What I've got now:

 api=tweepy.API(auth)
 for search in hashtags:
     for tweet in  tweepy.Cursor(api.search,q=search,count=1000,lang="en").items(): 
         text=repr(tweet.text.encode("utf-8"))  
         out.write(text+"\n")

You can add " -filter:retweets" to your query to only get original tweets. Maybe not the prettiest solution, but it works.

api=tweepy.API(auth)
for search in hashtags:
    for tweet in  tweepy.Cursor(api.search,q=search+" -filter:retweets",count=1000,lang="en").items(): 
        text=repr(tweet.text.encode("utf-8"))  
        out.write(text+"\n")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM