[英]how to get every single tweets with a certain hashtag using tweepy and twitter REST API
for a data visualization project I need to gather all tweets (would it be possible at all?) with a certain hashtag.对于数据可视化项目,我需要收集所有带有特定标签的推文(这可能吗?)。 for this purpose I am using the code below.
为此,我使用下面的代码。 it uses Tweepy and REST API.
它使用 Tweepy 和 REST API。 However, it only downloads up to around 2500 tweets or less.
但是,它最多只能下载大约 2500 条推文或更少。 I was wondering how I can fix this limitation.
我想知道如何解决这个限制。 is there a pro subscription or anything else i should purchase or how should I modify the code.
是否有专业订阅或我应该购买的其他任何东西,或者我应该如何修改代码。
#!/usr/bin/python
# -*- coding: utf-8 -*-
# this file is configured for rtl language and farsi characters
import sys
from key import *
import tweepy
#imported from the key.py file
API_KEY =KAPI_KEY
API_SECRET =KAPI_SECRET
OAUTH_TOKEN =KOAUTH_TOKEN
OAUTH_TOKEN_SECRET =KOAUTH_TOKEN_SECRET
auth = tweepy.AppAuthHandler(API_KEY, API_SECRET)
api = tweepy.API(auth, wait_on_rate_limit=True,
wait_on_rate_limit_notify=True)
if not api:
print("Can't Authenticate")
sys.exit(-1)
def write_unicode(text, charset='utf-8'):
return text.encode(charset)
searchQuery = "#کرونا" # this is what we're searching for
maxTweets = 100000 # Some arbitrary large number
tweetsPerQry = 100 # this is the max the API permits
fName = 'Corona-rest8.txt' # We'll store the tweets in a text file.
sinceId = None
max_id = -1
tweetCount: int = 0
print("Downloading max {0} tweets".format(maxTweets))
with open(fName, "wb") as f:
while tweetCount < maxTweets:
try:
if max_id <= 0:
if not sinceId:
new_tweets = api.search(q=searchQuery, count=tweetsPerQry)
else:
new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
since_id=sinceId)
else:
if not sinceId:
new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
max_id=str(max_id - 1))
else:
new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
max_id=str(max_id - 1),
since_id=sinceId)
if not new_tweets:
print("No more tweets found")
break
for tweet in new_tweets:
#print(tweet._json["created_at"])
if str(tweet._json["user"]["location"])!="":
print(tweet._json["user"]["location"])
myDict = json.dumps(tweet._json["text"], ensure_ascii=False).encode('utf8')+ "\n".encode('ascii')
f.write(myDict)
tweetCount += len(new_tweets)
print("Downloaded {0} tweets".format(tweetCount))
max_id = new_tweets[-1].id
except tweepy.TweepError as e:
# Just exit if any error
print("some error : " + str(e))
break
print("Downloaded {0} tweets, Saved to {1}".format(tweetCount, fName))
The tweepy
API Reference for api.search()
provides a bit of color on this: tweepy
API 参考api.search()
提供了一些颜色:
Please note that Twitter's search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets.
请注意,Twitter 的搜索服务以及搜索 API并不是推文的详尽来源。 Not all Tweets will be indexed or made available via the search interface.
并非所有推文都会被索引或通过搜索界面提供。
To answer your question directly, it is not possible to acquire an exhaustive list of tweets from the API (because of several limitations).要直接回答您的问题,不可能从 API 获得详尽的推文列表(由于若干限制)。 However, a few scraping-based Python libraries are available to work around these API limitations, like @taspinar's
twitterscraper
.但是,一些基于抓取的 Python 库可用于解决这些 API 限制,例如 @taspinar 的
twitterscraper
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.