My first time using Tweepy and I am a Python novice. I used the following code following the OAuth to collect tweets using Tweepy:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
file = open('SOTU1.txt', 'a')
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
print status.text
def on_data(self, data):
json_data = json.loads(data)
file.write(str(json_data))
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
And the resultant text file looks like this and continues on as one string object:
{u'contributors': None, u'truncated': False, u'text': u'Lost my cool today
\U0001f602\U0001f63e like completely', u'in_reply_to_status_id': None, u'id':
557709279751581696, u'favorite_count': 0, u'source': u'<a
href="http://twitter.com/download/android" rel="nofollow">Twitter for
Android</a>', u'retweeted': False, u'coordinates': {u'type': u'Point',
u'coordinates': [-97.925459, 29.877993]}, u'timestamp_ms': u'1421803228687',
u'entities': {u'user_mentions': [], u'symbols': [], u'trends': [],
u'hashtags': [], u'urls': []}, u'in_reply_to_screen_name': None, u'id_str':
u'557709279751581696', u'retweet_count': 0, u'in_reply_to_user_id': None,
u'favorited': False, u'user': {u'follow_request_sent': None,
u'profile_use_background_image': True, u'default_profile_image': False, u'id':
1239731318, u'verified': False, u'profile_image_url_https':
I have tried various solutions offered on the site, although none worked because it is not a list, but a string. I have tried to make it into dictionary form by removing the "u'", but the right side of the pair has words not enclosed by "".
My goal is to extract the text and geocode from each tweet and I am hoping to process the JSON file in bash using jq. But as of now I cannot feed this data to jq, and it is hard to identify which batch of lines come from a single tweet.
Thanks in advance!
def on_data(self, data):
json_data = json.loads(data)
json.dump(json_data,my_file)
then when you want it back
json_data = json.load(open("file.txt"))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.