简体   繁体   中英

Loading `status` objects from File?

A few months ago, I started grabbing Tweets from twitter for a data analysis project. I used Tweepy and python3.3 to get the status objects and dump them to a file, one file per day.

I mostly did this because I only wanted to quickly gather data, however I'm facing this problem:

Since the status objects are now strings, I cannot convert them back using tweepy - as far as I can tell - by loading them from my files.

Which sucks, because I now realize I truly only need the status._json part of the object. For whatever reason I thought otherwise 3 months ago.

My question is this:

Is there a known way of converting these status objects back from strings ?

I have checked Tweepy docs and googled about, and I am pretty sure that this is not possible by given tools.

The only alternative I can see is to split the string manually, which seems pretty ugly.

Example of a status object saved in my file:
pastebin

These are stored per line, by simply appending them to the file each time a new one is grabbed from twitter.

This is not the answer you're expecting, but might give a starting point.

I took one instance of your Status record, put it in a text file , and ran this script:

# coding: utf-8

with open('status.txt') as f:
    tco = f.read()

import re
re.compile("(?P<key>\w+)=(?P<value>\w+)")
expre = re.compile("(?P<key>\w+)=(?P<value>\w+)")
pairs = dict(re.findall(expre, tco))

And this gives you something like this:

{'author': 'User',
 'contributors': 'None',
 'contributors_enabled': 'False',
 'coordinates': 'None',
 'created_at': 'datetime',
 'default_profile': 'True',
 'default_profile_image': 'False',
 'favorite_count': '0',
 'favorited': 'False',
 'favourites_count': '46',
 'follow_request_sent': 'None',
 'followers_count': '204',
 'following': 'False',
 'friends_count': '274',
 'geo': 'None',
 'geo_enabled': 'True',
 'id': '652242063048724480',
 'in_reply_to_screen_name': 'None',
 'in_reply_to_status_id': 'None',
 'in_reply_to_status_id_str': 'None',
 'in_reply_to_user_id': 'None',
 'in_reply_to_user_id_str': 'None',
 'is_quote_status': 'False',
 'is_translator': 'False',
 'listed_count': '91',
 'location': 'None',
 'notifications': 'None',
 'place': 'None',
 'possibly_sensitive': 'False',
 'profile_background_tile': 'False',
 'profile_use_background_image': 'True',
 'protected': 'False',
 'retweet_count': '0',
 'retweeted': 'False',
 'statuses_count': '9724',
 'truncated': 'False',
 'user': 'User',
 'utc_offset': '7200',
 'verified': 'False'}

Now obviously, this is missing a lot of information my simple regex couldn't parse. User object attributes, for example. Some json dicts also.

For more complicated things that are in your problem at hand, I'd advise you to look into parser module. I'll see what I can do in my free time to get around this though. Seems like a good problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM