简体   繁体   中英

How can I grab Coordinates and TimeStamp of Twitter tweets in specific bounding box with Python/Tweepy?

I am trying to extract tweet locations from a specific area with python using tweepy + writing it into a csv-file. I am not very much into python but I could manage to put together the following sript which kind of works:

import json
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener

#Enter Twitter API Key information
consumer_key = 'cons_key'
consumer_secret = 'cons_secret'
access_token = 'acc_token'
access_secret = 'acc-secret'

file = open("C:\Python27\Output2.csv", "w")
file.write("X,Y\n")

data_list = []
count = 0

class listener(StreamListener):

    def on_data(self, data):
        global count

        #How many tweets you want to find, could change to time based
        if count <= 100:
            json_data = json.loads(data)

            coords = json_data["coordinates"]
            if coords is not None:
               print coords["coordinates"]
               lon = coords["coordinates"][0]
               lat = coords["coordinates"][1]

               data_list.append(json_data)

               file.write(str(lon) + ",")
               file.write(str(lat) + "\n")

               count += 1
            return True
        else:
            file.close()
            return False

    def on_error(self, status):
        print status

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
twitterStream = Stream(auth, listener())
#What you want to search for here
twitterStream.filter(locations=[11.01,47.85,12.09,48.43])

the problem is, that it extracts the coordinates very slowly (like 10 entries per 30 minutes). Would there be a way to make this faster?

How can I add the timestamps for each tweet? Is there way to make sure to retrieve all tweets possible for the specific region (I guess the max is all tweets of the past week)?

thanks very much in advance!

Twitter's standard streaming API provides a 1% sample of all the Tweets posted. In addition, very few Tweets have location data added to them. So, I'm not surprised that you're only getting a small number of Tweets in a 30 minute timespan for one specific bounding box. The only way to improve the volume would be to pay for the enterprise PowerTrack API.

Tweets all contain a created_at value which is the time stamp you'll want to record.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM