简体   繁体   中英

obtaining dictionary representation for a JSON, saved as string in file

I am working with twitter dataset, and I have converted extracted some tweets to a particular file, and while doing that i used fil.write(str(tweet)+"\\n") , where tweet is proper dictionary obtained by tweet = json.loads(item) . Now when I open the extracted file line by line

amr = open('tweets-1385844523.json')

akw = open("misw","w")


for line in amr:
    flag =0
    print type(line)
    twe = json.loads(line)

but here I am getting the following error

Traceback (most recent call last):
  File "check2.py", line 17, in <module>
    twe = json.loads(line)
  File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 365, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 381, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 1 column 1 (char 1)

the sample of the file is

{u'contributors': None, u'truncated': False, u'text': u'How the heck am I supposed to walk with these on? \U0001f631 http://t.co/ndQlFY6ZaD', u'in_reply_to_status_id': None, u'id': 406887535240298496, u'favorite_count': 0, u'source': u'<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', u'retweeted': False, u'coordinates': None, u'entities': {u'symbols': [], u'user_mentions': [], u'hashtags': [], u'urls': [], u'media': [{u'expanded_url': u'http://twitter.com/CamilaZaggy/status/406887535240298496/photo/1', u'display_url': u'pic.twitter.com/ndQlFY6ZaD', u'url': u'http://t.co/ndQlFY6ZaD', u'media_url_https': u'https://pbs.twimg.com/media/BaWODrCIgAAVUzW.jpg', u'id_str': u'406887535089319936', u'sizes': {u'small': {u'h': 455, u'resize': u'fit', u'w': 340}, u'large': {u'h': 1024, u'resize': u'fit', u'w': 764}, u'medium': {u'h': 804, u'resize': u'fit', u'w': 599}, u'thumb': {u'h': 150, u'resize': u'crop', u'w': 150}}, u'indices': [52, 74], u'type': u'photo', u'id': 406887535089319936, u'media_url': u'http://pbs.twimg.com/media/BaWODrCIgAAVUzW.jpg'}]}, u'in_reply_to_screen_name': None, u'id_str': u'406887535240298496', u'retweet_count': 0, u'in_reply_to_user_id': None, u'favorited': False, u'user': {u'follow_request_sent': None, u'profile_use_background_image': True, u'default_profile_image': False, u'id': 1036100580, u'verified': False, u'profile_image_url_https': u'https://pbs.twimg.com/profile_images/378800000661722703/a22f21ef022be63e2f22f64002065e11_normal.jpeg', u'profile_sidebar_fill_color': u'DDEEF6', u'profile_text_color': u'333333', u'followers_count': 70, u'profile_sidebar_border_color': u'FFFFFF', u'id_str': u'1036100580', u'profile_background_color': u'FFFFFF', u'listed_count': 0, u'profile_background_image_url_https': u'https://si0.twimg.com/profile_background_images/872742115/e93bd4da46567ab1d785b8de7e4fe16a.jpeg', u'utc_offset': -14400, u'statuses_count': 1024, u'description': u"I'm the happiest when I'm in concerts \u270c", u'friends_count': 264, u'location': u'IG: heyheymila', u'profile_link_color': u'F5A6F5', u'profile_image_url': u'http://pbs.twimg.com/profile_images/378800000661722703/a22f21ef022be63e2f22f64002065e11_normal.jpeg', u'following': None, u'geo_enabled': True, u'profile_banner_url': u'https://pbs.twimg.com/profile_banners/1036100580/1385170913', u'profile_background_image_url': u'http://a0.twimg.com/profile_background_images/872742115/e93bd4da46567ab1d785b8de7e4fe16a.jpeg', u'name': u'\u2744 Camila \u2744', u'lang': u'en', u'profile_background_tile': False, u'favourites_count': 182, u'screen_name': u'CamilaZaggy', u'notifications': None, u'url': None, u'created_at': u'Wed Dec 26 02:31:29 +0000 2012', u'contributors_enabled': False, u'time_zone': u'Atlantic Time (Canada)', u'protected': False, u'default_profile': False, u'is_translator': False}, u'geo': None, u'in_reply_to_user_id_str': None, u'possibly_sensitive': False, u'lang': u'en', u'created_at': u'Sat Nov 30 20:48:42 +0000 2013', u'filter_level': u'medium', u'in_reply_to_status_id_str': None, u'place': None}

You are encoding your date using str(tweet) and then try to decode it using json.load() (or json.joads() ). The two just don't fit. They are using different rules, they assume a different code.

For example, str({ "foo": "bar" }) will result in {'foo': 'bar'} (note the single quotes) while Json could parse a string like {"foo": "bar"} (with double quotes).

The best solution would of course be to use the Json code also for encoding, ie use

fil.write(json.dumps(tweet)+"\n")

instead on creation.

If this is not possible anymore (sounds a bit like you already have a large data base), you can use

twe = eval(line)

to parse the old data. You wouldn't use Json at all then ;-)

But notice that there might be security issues concerning the use of eval in case its input isn't your own data but generated by a possible intruder.

You're cutting it per lines, which is not how to parse json. In fact-it's almost impossible. The standard json libarary provides a function for this. Use json.load(amr) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM