简体   繁体   English

Tweepy-仅捕获发布了特定关键字的关注者

[英]Tweepy - Capturing only followers that have tweeted a certain keyword

I've been looking all over but cannot find an answer, or even a discussion, on this topic. 我一直在寻找,但是找不到关于该主题的答案,甚至没有讨论。 I'm trying to sample ALL the followers of a twitter user that have tweeted something with a specific keyword. 我正在尝试对使用特定关键字发布了推文的Twitter用户的所有关注者进行抽样。 I've found a script that will allow me to pull all the followers of a specific user, but how would one edit this to sort these users by content keyword? 我找到了一个脚本,该脚本可让我提取特定用户的所有关注者,但是如何编辑此脚本以按content关键字对这些用户进行排序? Is this possible? 这可能吗?

It seems that being able to sort followers by these kind of content interests would be beneficial, but I haven't seen anyone discuss it elsewhere. 似乎能够按照这些内容兴趣对关注者进行排序将是有益的,但我还没有看到有人在其他地方进行讨论。 Thank you for any insight you can provide! 感谢您提供的任何见解!

"""
http://stackoverflow.com/questions/31000178/how-to-get-large-list-of-followers-tweepy
ask user for account name to harvest follower names from.
print follower names to screen
pause  users to screen
"""
import tweepy
import time
import csv
import sys
import logging
logging.basicConfig()


accountvar = "nytimes"
#todo: upgrade this to read usernames from a file.
print "searching for followers of "+accountvar

consumer_key = "xxxxx"
consumer_secret = "xxxxx"
access_token = "xxxxx"
access_token_secret = "xxxxx"

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
#refer http://docs.tweepy.org/en/v3.2.0/api.html#API
#tells tweepy.API to automatically wait for rate limits to replenish


users = tweepy.Cursor(api.followers, screen_name=accountvar).items()
count = 0
errorCount=0


outputfilecsv = accountvar+"followers.csv"
fc = csv.writer(open(outputfilecsv, 'wb'))
fc.writerow(["screen_name", "ID", "followers_count","statuses_count","location","geo_enabled"])

while True:
    try:
        user = next(users)
        count += 1
        #use count-break during dev to avoid twitter restrictions
        #if (count>10):
        #    break
    except tweepy.TweepError:
        #catches TweepError when rate limiting occurs, sleeps, then restarts.
        #nominally 15 minnutes, make a bit longer to avoid attention.
        print "sleeping...."
        time.sleep(60*16)
        user = next(users)
    except StopIteration:
        break
    try:
        print "@" + user.screen_name + " has " + str(user.followers_count) +\
              " followers, has made "+str(user.statuses_count)+" tweets and location=" +\
              user.location+" geo_enabled="+str(user.geo_enabled)+" count="+str(count)

        fc.writerow([user.screen_name, user.id_str, str(user.followers_count), str(user.statuses_count), user.location, str(user.geo_enabled)])
    except UnicodeEncodeError:
        errorCount += 1
        print "UnicodeEncodeError,errorCount ="+str(errorCount)


#apparently don't need to close csv.writer.
print "completed, errorCount ="+str(errorCount)+" total users="+str(count)
    #print "@" + user.screen_name
    #todo: write users to file, search users for interests, locations etc.

that's an interesting question. 这是一个有趣的问题。 Let's say that you want to search for the word "banana". 假设您要搜索“香蕉”一词。

Your search query will be q=banana - see https://twitter.com/search?q=banana 您的搜索查询将是q=banana -请参阅https://twitter.com/search?q=banana

If you want to see if specific accounts have Tweeted that word, the query is 如果您想查看特定帐户是否已鸣叫该词,则查询为

q=banana from:bbcnews OR from:cnn OR from:itv

See https://twitter.com/search?q=banana%20from:bbcnews%20OR%20from:cnn%20OR%20from:itv 参见https://twitter.com/search?q=banana%20from:bbcnews%20OR%20from:cnn%20OR%20from:itv

I don't know if there's a limit to how many from: addresses you can have. 我不知道是否有多少限制from:您可以将地址有。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM