简体   繁体   English

如何使用Python / Tweepy在特定的边界框中获取Twitter推文的坐标和时间戳?

[英]How can I grab Coordinates and TimeStamp of Twitter tweets in specific bounding box with Python/Tweepy?

I am trying to extract tweet locations from a specific area with python using tweepy + writing it into a csv-file. 我正在尝试使用tweepy +将其从python的特定区域中的tweet位置提取出来,并将其写入csv文件中。 I am not very much into python but I could manage to put together the following sript which kind of works: 我不太喜欢python,但我可以设法整理出以下哪些类型的作品:

import json
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener

#Enter Twitter API Key information
consumer_key = 'cons_key'
consumer_secret = 'cons_secret'
access_token = 'acc_token'
access_secret = 'acc-secret'

file = open("C:\Python27\Output2.csv", "w")
file.write("X,Y\n")

data_list = []
count = 0

class listener(StreamListener):

    def on_data(self, data):
        global count

        #How many tweets you want to find, could change to time based
        if count <= 100:
            json_data = json.loads(data)

            coords = json_data["coordinates"]
            if coords is not None:
               print coords["coordinates"]
               lon = coords["coordinates"][0]
               lat = coords["coordinates"][1]

               data_list.append(json_data)

               file.write(str(lon) + ",")
               file.write(str(lat) + "\n")

               count += 1
            return True
        else:
            file.close()
            return False

    def on_error(self, status):
        print status

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
twitterStream = Stream(auth, listener())
#What you want to search for here
twitterStream.filter(locations=[11.01,47.85,12.09,48.43])

the problem is, that it extracts the coordinates very slowly (like 10 entries per 30 minutes). 问题在于,它提取坐标的速度非常慢(例如每30分钟10个条目)。 Would there be a way to make this faster? 有没有办法使它更快?

How can I add the timestamps for each tweet? 如何为每条推文添加时间戳? Is there way to make sure to retrieve all tweets possible for the specific region (I guess the max is all tweets of the past week)? 有没有办法确保检索特定区域所有可能的推文(我猜最大为过去一周的所有推文)?

thanks very much in advance! 首先十分感谢!

Twitter's standard streaming API provides a 1% sample of all the Tweets posted. Twitter的标准流API提供了所有已发布推文的1%示例。 In addition, very few Tweets have location data added to them. 此外,很少有推文中添加了位置数据。 So, I'm not surprised that you're only getting a small number of Tweets in a 30 minute timespan for one specific bounding box. 因此,对于一个特定的边界框,您在30分钟的时间内仅收到少量Tweet,我并不感到惊讶。 The only way to improve the volume would be to pay for the enterprise PowerTrack API. 提高数量的唯一方法是为企业PowerTrack API付费。

Tweets all contain a created_at value which is the time stamp you'll want to record. 所有推文都包含created_at值,这是您要记录的时间戳。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM