简体   繁体   English

在tweepy中使用推特ID检索推特列表

[英]retrieving a list of tweets using tweet ID in tweepy

I ha ve a file containing a list of tweet IDs and I want to retrieve those tweets. 我有一个包含推文ID列表的文件,我想检索这些推文。 The file contains more than 100000 tweets and the twitter API allows to retrieve only 100. 该文件包含超过100000条推文,twitter API仅允许检索100条。

api = tweepy.API(auth)
good_tweet_ids = [i for i in por.TweetID[0:100]]
tweets = api.statuses_lookup(good_tweet_ids)
for tweet in tweets:
    print(tweet.text)

Is there a way to retrieve more tweets say 1000 or 2000, I don't want to take a sample of the data and save the results to a file and change the index of the tweet ID every time so is there a way to do that !? 有没有办法检索更多的推文(例如1000或2000),我不想提取数据样本并将结果保存到文件中,并且每次都更改推文ID的索引,所以有办法吗!

Yes - twitter only lets you lookup 100 tweets at a time, but you can look up another 100 immediately after that. 是的-Twitter仅允许您一次查找100条推文,但此后您可以立即查找另外100条推文。 The only concern then is rate limits - you are restricted by the number of calls that you can make to the API in each 15 minute window. 然后,唯一需要考虑的是速率限制-您受到每15分钟窗口内可对API进行调用的次数的限制。 Fortunately, tweepy is able to handle this gracefully when you create the API by using wait_on_rate_limit=True . 幸运的是,当您通过使用wait_on_rate_limit=True创建API时,tweepy可以正常处理此问题。 All we need to do, then, is process our full list of tweet IDs into batches of 100 or fewer (suppose you have 130 - the second batch should only be the final 30) and look them up one at a time. 然后,我们所需要做的就是将完整的tweet ID列表处理为100个或更少的批次(假设您有130个-第二批次应该是最后30个),并一次查找一个。 Try the following: 请尝试以下操作:

import tweepy


def lookup_tweets(tweet_IDs, api):
    full_tweets = []
    tweet_count = len(tweet_IDs)
    try:
        for i in range((tweet_count / 100) + 1):
            # Catch the last group if it is less than 100 tweets
            end_loc = min((i + 1) * 100, tweet_count)
            full_tweets.extend(
                api.statuses_lookup(id=tweet_IDs[i * 100:end_loc])
            )
        return full_tweets
    except tweepy.TweepError:
        print 'Something went wrong, quitting...'

consumer_key = 'XXX'
consumer_secret = 'XXX'
access_token = 'XXX'
access_token_secret = 'XXX'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

# do whatever it is to get por.TweetID - the list of all IDs to look up

results = lookup_tweets(por.TweetID, api)

for tweet in results:
    if tweet:
        print tweet.text

Addition to the code above. 除了上面的代码。 The output format if the tweet is a twitter status object. 如果tweet是twitter状态对象,则为输出格式。 The following piece of code will convert it into a sterilizable json and then map it to the tweet id to get a full df. 下面的代码将其转换为可消毒的json,然后将其映射到tweet id以获取完整的df。

df = pd.read_csv('your.csv')
good_tweet_ids = [i for i in df.TweetID] #tweet ids to look up 
results = lookup_tweets(good_tweet_ids, api) #apply function

#Wrangle the data into one dataframe
import json
temp = json.dumps([status._json for status in results]) #create JSON
newdf = pd.read_json(temp, orient='records')
full = pd.merge(df, newdf, left_on='TweetID', right_on='id', how='left').drop('id', axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM