在tweepy中使用推特ID检索推特列表

Question

I ha ve a file containing a list of tweet IDs and I want to retrieve those tweets. 我有一个包含推文ID列表的文件，我想检索这些推文。 The file contains more than 100000 tweets and the twitter API allows to retrieve only 100. 该文件包含超过100000条推文，twitter API仅允许检索100条。

api = tweepy.API(auth)
good_tweet_ids = [i for i in por.TweetID[0:100]]
tweets = api.statuses_lookup(good_tweet_ids)
for tweet in tweets:
    print(tweet.text)

Is there a way to retrieve more tweets say 1000 or 2000, I don't want to take a sample of the data and save the results to a file and change the index of the tweet ID every time so is there a way to do that !? 有没有办法检索更多的推文（例如1000或2000），我不想提取数据样本并将结果保存到文件中，并且每次都更改推文ID的索引，所以有办法吗！

Answer 1

Yes - twitter only lets you lookup 100 tweets at a time, but you can look up another 100 immediately after that. 是的-Twitter仅允许您一次查找100条推文，但此后您可以立即查找另外100条推文。 The only concern then is rate limits - you are restricted by the number of calls that you can make to the API in each 15 minute window. 然后，唯一需要考虑的是速率限制-您受到每15分钟窗口内可对API进行调用的次数的限制。 Fortunately, tweepy is able to handle this gracefully when you create the API by using wait_on_rate_limit=True . 幸运的是，当您通过使用wait_on_rate_limit=True创建API时，tweepy可以正常处理此问题。 All we need to do, then, is process our full list of tweet IDs into batches of 100 or fewer (suppose you have 130 - the second batch should only be the final 30) and look them up one at a time. 然后，我们所需要做的就是将完整的tweet ID列表处理为100个或更少的批次（假设您有130个-第二批次应该是最后30个），并一次查找一个。 Try the following: 请尝试以下操作：

import tweepy


def lookup_tweets(tweet_IDs, api):
    full_tweets = []
    tweet_count = len(tweet_IDs)
    try:
        for i in range((tweet_count / 100) + 1):
            # Catch the last group if it is less than 100 tweets
            end_loc = min((i + 1) * 100, tweet_count)
            full_tweets.extend(
                api.statuses_lookup(id=tweet_IDs[i * 100:end_loc])
            )
        return full_tweets
    except tweepy.TweepError:
        print 'Something went wrong, quitting...'

consumer_key = 'XXX'
consumer_secret = 'XXX'
access_token = 'XXX'
access_token_secret = 'XXX'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

# do whatever it is to get por.TweetID - the list of all IDs to look up

results = lookup_tweets(por.TweetID, api)

for tweet in results:
    if tweet:
        print tweet.text

Answer 2

Addition to the code above. 除了上面的代码。 The output format if the tweet is a twitter status object. 如果tweet是twitter状态对象，则为输出格式。 The following piece of code will convert it into a sterilizable json and then map it to the tweet id to get a full df. 下面的代码将其转换为可消毒的json，然后将其映射到tweet id以获取完整的df。

df = pd.read_csv('your.csv')
good_tweet_ids = [i for i in df.TweetID] #tweet ids to look up 
results = lookup_tweets(good_tweet_ids, api) #apply function

#Wrangle the data into one dataframe
import json
temp = json.dumps([status._json for status in results]) #create JSON
newdf = pd.read_json(temp, orient='records')
full = pd.merge(df, newdf, left_on='TweetID', right_on='id', how='left').drop('id', axis=1)

在tweepy中使用推特ID检索推特列表

问题描述

2 个解决方案

解决方案1
7 已采纳 2017-06-16 09:43:09

解决方案2
0 2018-06-30 14:18:11

在tweepy中使用推特ID检索推特列表

问题描述

2 个解决方案

解决方案1 7 已采纳 2017-06-16 09:43:09

解决方案2 0 2018-06-30 14:18:11

解决方案1
7 已采纳 2017-06-16 09:43:09

解决方案2
0 2018-06-30 14:18:11