简体   繁体   English

如何获得超过一周的推文(使用 tweepy 或其他 python 库)

[英]How can I get tweets older than a week (using tweepy or other python libraries)

I have been trying to figure this out but this is a really frustrating.我一直在试图解决这个问题,但这真的很令人沮丧。 I'm trying to get tweets with a certain hashtag (a great amount of tweets) using Tweepy.我正在尝试使用 Tweepy 获取带有特定主题标签(大量推文)的推文。 But this doesn't go back more than one week.但这不会超过一周。 I need to go back at least two years for a period of a couple of months.我需要回到至少两年的时间里几个月。 Is this even possible, if so how?这甚至可能吗,如果有的话怎么办?

Just for the check here is my code只是为了检查这里是我的代码

import tweepy
import csv

consumer_key = '####'
consumer_secret = '####'
access_token = '####'
access_token_secret = '####'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

# Open/Create a file to append data
csvFile = open('tweets.csv', 'a')
#Use csv Writer
csvWriter = csv.writer(csvFile)


for tweet in tweepy.Cursor(api.search,q="#ps4",count=100,\
                           lang="en",\
                           since_id=2014-06-12).items():
    print tweet.created_at, tweet.text
    csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])

As you have noticed Twitter API has some limitations, I have implemented a code that do this using the same strategy as Twitter running over a browser.正如您已经注意到 Twitter API 有一些限制一样,我已经实现了一个代码,它使用与在浏览器上运行的 Twitter 相同的策略来执行此操作。 Take a look, you can get the oldest tweets: https://github.com/Jefferson-Henrique/GetOldTweets-python看看,可以得到最老的推文: https : //github.com/Jefferson-Henrique/GetOldTweets-python

You cannot use the twitter search API to collect tweets from two years ago.您不能使用 Twitter 搜索 API 来收集两年前的推文。 Per the docs:根据文档:

Also note that the search results at twitter.com may return historical results while the Search API usually only serves tweets from the past week.另请注意,twitter.com 上的搜索结果可能会返回历史结果,而 Search API 通常仅提供过去一周的推文。 - Twitter documentation . - Twitter 文档

If you need a way to get old tweets, you can get them from individual users because collecting tweets from them is limited by number rather than time (so in many cases you can go back months or years).如果您需要一种获取旧推文的方法,您可以从个人用户那里获取它们,因为从他们那里收集推文受数量而非时间限制(因此在许多情况下,您可以追溯到数月或数年)。 A third-party service that collects tweets like Topsy may be useful in your case as well (shut down as of July 2016, but other services exist).收集像Topsy这样的推文的第三方服务也可能对您有用(自 2016 年 7 月起关闭,但存在其他服务)。

Found one code that would help retrieve older tweets.找到了一个有助于检索旧推文的代码。 https://github.com/Jefferson-Henrique/GetOldTweets-python https://github.com/Jefferson-Henrique/GetOldTweets-python

To get old tweets, run the following command in the directory where the code repository got extracted.要获取旧推文,请在提取代码存储库的目录中运行以下命令。

python Exporter.py --querysearch 'keyword' --since 2016-01-10 --until 2016-01-15 --maxtweets 1000

And it returned a file 'output_got.csv' with 1000 tweets during the above days with your keyword它返回了一个文件“output_got.csv”,在上述几天内使用您的关键字发送了 1000 条推文

You need to install a module 'pyquery' for this to work您需要安装一个模块“pyquery”才能工作

PS: You can modify 'Exporter.py' python code file to get more tweet attributes as per your requirement. PS:您可以根据需要修改“Exporter.py”python 代码文件以获取更多推文属性。

2018 update: Twitter has Premium search APIs that can return results from the beginning of time (2006): 2018 年更新:Twitter 拥有高级搜索 API,可以从时间开始(2006 年)返回结果:

https://developer.twitter.com/en/docs/tweets/search/overview/premium#ProductPackages https://developer.twitter.com/en/docs/tweets/search/overview/premium#ProductPackages

Search Tweets: 30-day endpoint → provides Tweets from the previous 30 days.搜索推文:30 天端点 → 提供前 30 天的推文。

Search Tweets: Full-archive endpoint → provides complete and instant access to Tweets dating all the way back to the first Tweet in March 2006.搜索推文:完整存档端点 → 提供对推文的完整和即时访问,可追溯到 2006 年 3 月的第一条推文。

With an example Python client: https://github.com/twitterdev/search-tweets-python以 Python 客户端为例: https : //github.com/twitterdev/search-tweets-python

Knowing this is a very old question but still, some folks might be facing the same issue.知道这是一个非常古老的问题,但仍有一些人可能面临同样的问题。 After some digging, I found out Tweepy's search only returns data for the past 7 days and that some times lead to buy third party service.经过一番挖掘,我发现 Tweepy 的搜索仅返回过去 7 天的数据,并且有时会导致购买第三方服务。 I utilised python library, GetOldTweets3 and it worked fine for me.我使用了 Python 库GetOldTweets3 ,它对我来说效果很好。 The utility of this library is really easy.这个库的实用程序非常简单。 The only limitation of this library that we can't search for more than one hashtag in one execution but it works fine to search for multiple accounts at the same time.这个库的唯一限制是我们不能在一次执行中搜索多个主题标签,但可以同时搜索多个帐户。

As others have noted, the Twitter API has the date limitation, but not the actual advanced search as implemented on twitter.com.正如其他人所指出的,Twitter API 有日期限制,但没有在 twitter.com 上实现的实际高级搜索。 So so the solution is to use Python's wrapper for Selenium or PhantomJS to iterate through the twitter.com endpoint.因此,解决方案是使用 Python 的 Selenium 或 PhantomJS 包装器来遍历 twitter.com 端点。 Here's an implementation using Selenium that someone has posted on Github: https://github.com/bpb27/twitter_scraping/这是某人在 Github 上发布的使用 Selenium 的实现: https : //github.com/bpb27/twitter_scraping/

use the args "since" and "until" to adjust your timeframe.使用参数“since”和“until”来调整你的时间范围。 You are presently using since_id which is meant to correspond to twitter id values (not dates):您目前使用的since_id 是为了对应于twitter id 值(不是日期):

for tweet in tweepy.Cursor(api.search,
                           q="test",
                           since="2014-01-01",
                           until="2014-02-01",
                           lang="en").items():

I can't believe nobody said this but this git repository completely solved my problem.我不敢相信没有人这么说,但是这个 git 存储库完全解决了我的问题。 I haven't been able to utilize other solutions such as GOT or Twitter API Premium.我无法使用其他解决方案,例如 GOT 或 Twitter API Premium。

Try this, definitely useful:试试这个,绝对有用:

https://betterprogramming.pub/how-to-scrape-tweets-with-snscrape-90124ed006af https://betterprogramming.pub/how-to-scrape-tweets-with-snscrape-90124ed006af

https://github.com/MartinBeckUT/TwitterScraper/tree/master/snscrape/cli-with-python https://github.com/MartinBeckUT/TwitterScraper/tree/master/snscrape/cli-with-python

您可以使用Rest API使推文的发布时间超过一周。有关更多详细信息,请访问Twitter API参考https://dev.twitter.com/rest/reference/get/statuses/user_timeline

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM