简体   繁体   中英

Twitter Advanced Search Results Are Unevenly Distributed by Year

I'm using the twitter browser search function to surpass the api's one week limit and gather historical tweets for research purposes. I'm using the github repository found here: https://github.com/Jefferson-Henrique/GetOldTweets-python
This all worked very nicely, but with one rather odd hitch. I was gathering tweets within a 300 mile radius of Delhi with certain keywords from 1/1/2013 to 6/15/2017, and although I get tweets for all 4.5 years, there are always significantly more from around Dec 2013 to Apr 2015, no matter what the keyword or location is. I scoured the web to see if twitter made some sort of change to how it stores tweets, but found no plausible explanation, which led me here. Here's a code snippet (I can provide more, or output files if needed):

wordsearch("headache", 0, "en", "40.7128,-74.0059", "2015-01-01", "headacheNYC2015", "300mi")

Also, it's not due to 3rd party sources, as those are included. There is also a notable lack of retweets. I am aware that the search function only provides a random 1% sample, but this is a separate issue as there are never as many tweets after April 2015. If anyone knows ANY possible reason for this, please share!

Figured out the answer. In 2015 Twitter changed the way geotags worked, they added the preference/set default preference for geotagging tweets turned off. Thus when searching for geotagged tweets, there are far less after this point. More details can be found here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM