简体   繁体   English

tweepy 随机抽样推文

[英]Random sampling tweets with tweepy

I'm trying to analyze tweets that have the hashtag #contentmarketing.我正在尝试分析带有#contentmarketing 标签的推文。 I first tried grabbing 20,000 tweets with tweepy but ran into the rate limit.我首先尝试使用 tweepy 抓取 20,000 条推文,但遇到了速率限制。 So I'd like to take a random sample instead (or a couple random samples).所以我想取一个随机样本(或几个随机样本)。

I'm not really familiar with random sampling through an API call.我不太熟悉通过 API 调用进行随机抽样。 If I had an array that already contained the data, I would take random indices from that array without replacement.如果我有一个已经包含数据的数组,我将从该数组中获取随机索引而无需替换。 However, I don't think I can create that array in the first place without the rate limit kicking in.但是,我不认为我可以在没有速率限制的情况下首先创建该数组。

Can anyone enlighten me on how to access random tweets (or random data from an API, overall)?任何人都可以启发我如何访问随机推文(或来自 API 的随机数据,总体而言)?

For reference, here's the code that got me in rate limit purgatory:作为参考,这是让我陷入速率限制炼狱的代码:

import tweepy
from tweepy import OAuthHandler

consumerKey = 'my-key'
consumerSecret = 'my-key'
accessToken = 'my-key'
accessSecret = 'my-key'

auth = OAuthHandler(consumerKey, consumerSecret)
auth.set_access_token(accessToken, accessSecret)

api = tweepy.API(auth)

tweets = []

for tweet in tweepy.Cursor(api.search, q='#contentmarketing', count=20000, 
    lang='en', since='2017-06-20').items():
        tweets.append(tweet)

with open('content-tweets.json', 'w') as f:
    json.dump(tweets, f, sort_keys=True, indent=4)

这应该会阻止速率限制生效,只需对您的代码进行以下更改:

api = tweepy.API(auth, wait_on_rate_limit=True)

I ever heared about getting random tweets.我听说过获得随机推文。 But you can get "forever" tweets and not all of them, so this is quite the same.但是你可以得到“永远”的推文,而不是全部,所以这是完全一样的。

With the public search API, you can do 450 requests within 15 minutes (app auth).使用公共搜索 API,您可以在 15 分钟内执行 450 个请求(应用身份验证)。 So you can ask for 100 tweets every 2 seconds.所以你可以每 2 秒请求 100 条推文。 This is never ended.这永远不会结束。

Then change the "count" parameter to 100, and add a time.sleep(2) :然后将“count”参数更改为 100,并添加一个 time.sleep(2) :

import time 

for tweet in tweepy.Cursor(api.search, q='#contentmarketing', count=100, lang='en', since='2017-06-20').items():
        
tweets.append(tweet)
time.sleep(2)

Reference : https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets.html参考: https : //developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM