Parse massive JSON string from Tweepy or convert to dict/JSON format

Question

My first time using Tweepy and I am a Python novice. I used the following code following the OAuth to collect tweets using Tweepy:

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
file = open('SOTU1.txt', 'a')

class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
    print status.text

def on_data(self, data):
    json_data = json.loads(data)
    file.write(str(json_data))

def on_error(self, status_code):
    print >> sys.stderr, 'Encountered error with status code:', status_code
    return True # Don't kill the stream

def on_timeout(self):
    print >> sys.stderr, 'Timeout...'
    return True # Don't kill the stream

And the resultant text file looks like this and continues on as one string object:

{u'contributors': None, u'truncated': False, u'text': u'Lost my cool today           
\U0001f602\U0001f63e like completely', u'in_reply_to_status_id': None, u'id': 
557709279751581696, u'favorite_count': 0, u'source': u'<a 
href="http://twitter.com/download/android" rel="nofollow">Twitter for 
Android</a>', u'retweeted': False, u'coordinates': {u'type': u'Point', 
u'coordinates': [-97.925459, 29.877993]}, u'timestamp_ms': u'1421803228687', 
u'entities': {u'user_mentions': [], u'symbols': [], u'trends': [], 
u'hashtags': [], u'urls': []}, u'in_reply_to_screen_name': None, u'id_str': 
u'557709279751581696', u'retweet_count': 0, u'in_reply_to_user_id': None, 
u'favorited': False, u'user': {u'follow_request_sent': None, 
u'profile_use_background_image': True, u'default_profile_image': False, u'id': 
1239731318, u'verified': False, u'profile_image_url_https':

I have tried various solutions offered on the site, although none worked because it is not a list, but a string. I have tried to make it into dictionary form by removing the "u'", but the right side of the pair has words not enclosed by "".

My goal is to extract the text and geocode from each tweet and I am hoping to process the JSON file in bash using jq. But as of now I cannot feed this data to jq, and it is hard to identify which batch of lines come from a single tweet.

Thanks in advance!

Answer 1

def on_data(self, data):
    json_data = json.loads(data)
    json.dump(json_data,my_file)

then when you want it back

json_data = json.load(open("file.txt"))

Parse massive JSON string from Tweepy or convert to dict/JSON format

Question

1 answers

solution1
0 2015-02-04 23:15:55

Parse massive JSON string from Tweepy or convert to dict/JSON format

Question

1 answers

solution1 0 2015-02-04 23:15:55

solution1
0 2015-02-04 23:15:55