简体   繁体   中英

How can I get tweets older than a week (using tweepy or other python libraries)

I have been trying to figure this out but this is a really frustrating. I'm trying to get tweets with a certain hashtag (a great amount of tweets) using Tweepy. But this doesn't go back more than one week. I need to go back at least two years for a period of a couple of months. Is this even possible, if so how?

Just for the check here is my code

import tweepy
import csv

consumer_key = '####'
consumer_secret = '####'
access_token = '####'
access_token_secret = '####'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

# Open/Create a file to append data
csvFile = open('tweets.csv', 'a')
#Use csv Writer
csvWriter = csv.writer(csvFile)


for tweet in tweepy.Cursor(api.search,q="#ps4",count=100,\
                           lang="en",\
                           since_id=2014-06-12).items():
    print tweet.created_at, tweet.text
    csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])

As you have noticed Twitter API has some limitations, I have implemented a code that do this using the same strategy as Twitter running over a browser. Take a look, you can get the oldest tweets: https://github.com/Jefferson-Henrique/GetOldTweets-python

You cannot use the twitter search API to collect tweets from two years ago. Per the docs:

Also note that the search results at twitter.com may return historical results while the Search API usually only serves tweets from the past week. - Twitter documentation .

If you need a way to get old tweets, you can get them from individual users because collecting tweets from them is limited by number rather than time (so in many cases you can go back months or years). A third-party service that collects tweets like Topsy may be useful in your case as well (shut down as of July 2016, but other services exist).

Found one code that would help retrieve older tweets. https://github.com/Jefferson-Henrique/GetOldTweets-python

To get old tweets, run the following command in the directory where the code repository got extracted.

python Exporter.py --querysearch 'keyword' --since 2016-01-10 --until 2016-01-15 --maxtweets 1000

And it returned a file 'output_got.csv' with 1000 tweets during the above days with your keyword

You need to install a module 'pyquery' for this to work

PS: You can modify 'Exporter.py' python code file to get more tweet attributes as per your requirement.

2018 update: Twitter has Premium search APIs that can return results from the beginning of time (2006):

https://developer.twitter.com/en/docs/tweets/search/overview/premium#ProductPackages

Search Tweets: 30-day endpoint → provides Tweets from the previous 30 days.

Search Tweets: Full-archive endpoint → provides complete and instant access to Tweets dating all the way back to the first Tweet in March 2006.

With an example Python client: https://github.com/twitterdev/search-tweets-python

Knowing this is a very old question but still, some folks might be facing the same issue. After some digging, I found out Tweepy's search only returns data for the past 7 days and that some times lead to buy third party service. I utilised python library, GetOldTweets3 and it worked fine for me. The utility of this library is really easy. The only limitation of this library that we can't search for more than one hashtag in one execution but it works fine to search for multiple accounts at the same time.

As others have noted, the Twitter API has the date limitation, but not the actual advanced search as implemented on twitter.com. So so the solution is to use Python's wrapper for Selenium or PhantomJS to iterate through the twitter.com endpoint. Here's an implementation using Selenium that someone has posted on Github: https://github.com/bpb27/twitter_scraping/

use the args "since" and "until" to adjust your timeframe. You are presently using since_id which is meant to correspond to twitter id values (not dates):

for tweet in tweepy.Cursor(api.search,
                           q="test",
                           since="2014-01-01",
                           until="2014-02-01",
                           lang="en").items():

I can't believe nobody said this but this git repository completely solved my problem. I haven't been able to utilize other solutions such as GOT or Twitter API Premium.

Try this, definitely useful:

https://betterprogramming.pub/how-to-scrape-tweets-with-snscrape-90124ed006af

https://github.com/MartinBeckUT/TwitterScraper/tree/master/snscrape/cli-with-python

您可以使用Rest API使推文的发布时间超过一周。有关更多详细信息,请访问Twitter API参考https://dev.twitter.com/rest/reference/get/statuses/user_timeline

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM