简体   繁体   中英

parse special json format with python

I want to get GPSLatitude and GPSLongitude value, but I can't use python position because the position is pretty random. I get the value by tag's value, how can I do that?

jsonFlickrApi({ "photo": { "id": "8566959299", "secret": "141af38562", "server": "8233", "farm": 9, "camera": "Apple iPhone 4S", 
    "exif": [
      { "tagspace": "JFIF", "tagspaceid": 0, "tag": "JFIFVersion", "label": "JFIFVersion", 
        "raw": { "_content": 1.01 } },
      { "tagspace": "JFIF", "tagspaceid": 0, "tag": "ResolutionUnit", "label": "Resolution Unit", 
        "raw": { "_content": "inches" } },
      { "tagspace": "JFIF", "tagspaceid": 0, "tag": "XResolution", "label": "X-Resolution", 
        "raw": { "_content": 72 }, 
        "clean": { "_content": "72 dpi" } },
      { "tagspace": "JFIF", "tagspaceid": 0, "tag": "YResolution", "label": "Y-Resolution", 
        "raw": { "_content": 72 }, 
        "clean": { "_content": "72 dpi" } },
      { "tagspace": "GPS", "tagspaceid": 0, "tag": "GPSLatitudeRef", "label": "GPS Latitude Ref", 
        "raw": { "_content": "North" } },
      { "tagspace": "GPS", "tagspaceid": 0, "tag": "GPSLatitude", "label": "GPS Latitude", 
        "raw": { "_content": "39 deg 56' 44.40\"" }, 
        "clean": { "_content": "39 deg 56' 44.40\" N" } },
      { "tagspace": "GPS", "tagspaceid": 0, "tag": "GPSLongitudeRef", "label": "GPS Longitude Ref", 
        "raw": { "_content": "East" } },
      { "tagspace": "GPS", "tagspaceid": 0, "tag": "GPSLongitude", "label": "GPS Longitude", 
        "raw": { "_content": "116 deg 16' 10.20\"" }, 
        "clean": { "_content": "116 deg 16' 10.20\" E" } },
    ] }, "stat": "ok" })

You don't say whether you using one of the Flickr APIs; I assume not because handling JSON responses is trivial if you are using an API such as flickrapi .

import flickrapi

api_key = '88341066e8f0a40516599d28d8170627'   # from flickr's API explorer
secret = 'sssshhhh'
flickr = flickrapi.FlickrAPI(api_key, secret, format='parsed-json')
response = flickr.photos.getExif(photo_id='8566959299')
lat_long = {exif['tag']: exif['clean']['_content']
                    for exif in response['photo']['exif']
                        if exif['tag'] in (u'GPSLongitude', u'GPSLatitude')}

>>> from pprint import pprint
>>> pprint(lat_long)
{u'GPSLatitude': u'39 deg 56\' 44.40" N',
 u'GPSLongitude': u'116 deg 16\' 10.20" E'}

But continuing with the assumption that you are not using an API, the response format that you are seeing is actually JSONP which is better suited to Javascript than it is Python. You can, however, request a response in JSON representation that does not have the enclosing jsonFlickrApi() function wrapper. Do this by specifying format=json&nojsoncallback=1 in the query parameters of the request. Using the requests library makes requesting and parsing the JSON response easy, but this will work just as well with urllib2.urlopen() combined with json.loads() if you can't use requests eg

import requests

params = {'api_key': '88341066e8f0a40516599d28d8170627',
          'api_sig': '7b2dcfb2cd3a747179c2ed0fdc492699',
          'format': 'json',
          'method': 'flickr.photos.getExif',
          'nojsoncallback': '1',
          'photo_id': '8566959299',
          'secret': 'sssshhhh'}    
response = requests.get('https://api.flickr.com/services/rest/', params=params)
data = response.json()
lat_long = {exif['tag']: exif['clean']['_content']
                for exif in data['photo']['exif']
                    if exif['tag'] in (u'GPSLongitude', u'GPSLatitude')}

>>> from pprint import pprint
>>> pprint(lat_long)
{u'GPSLatitude': u'39 deg 56\' 44.40" N',
 u'GPSLongitude': u'116 deg 16\' 10.20" E'}

If looking at the whole string as jsonFlickrApi(XXX) , XXX is a standard JSON string. With json library, XXX can be converted to python dictionary and then parsed easily.

With the exception of the last comma just before the closing bracket ] , the entire object returned by the FlickrAPI is valid json .

Assuming that that comma is merely a copy-paste error ( example evidence suggests this is the case), then the builtin json module still won't be usable as is. That's because even though a string like "116 deg 16' 10.20\\" E" is valid json , python's json module will complain with a ValueError because the double quote " isn't sufficiently quoted:

>>> import json
>>> json.loads('{"a": "2"}')
{u'a': u'2'}
>>> json.loads('{"a": "2\""}')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 365, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 381, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 1 column 10 (char 9)

The solution is to add another escaping backslash:

>>> json.loads('{"a": "2\\""}')
{u'a': u'2"'}

For your full jsonFlickrApi response, you could add those extra backslashes with the re module :

>>> import re
>>> response = """jsonFlickrApi({ "photo": { "id": "8566959299", "secret": "141af38562", "server": "8233", "farm": 9, "camera": "Apple iPhone 4S", 
...     "exif": [
...       { "tagspace": "JFIF", "tagspaceid": 0, "tag": "JFIFVersion", "label": "JFIFVersion", 
...         "raw": { "_content": 1.01 } },
...       { "tagspace": "JFIF", "tagspaceid": 0, "tag": "ResolutionUnit", "label": "Resolution Unit", 
...         "raw": { "_content": "inches" } },
...       { "tagspace": "JFIF", "tagspaceid": 0, "tag": "XResolution", "label": "X-Resolution", 
...         "raw": { "_content": 72 }, 
...         "clean": { "_content": "72 dpi" } },
...       { "tagspace": "JFIF", "tagspaceid": 0, "tag": "YResolution", "label": "Y-Resolution", 
...         "raw": { "_content": 72 }, 
...         "clean": { "_content": "72 dpi" } },
...       { "tagspace": "GPS", "tagspaceid": 0, "tag": "GPSLatitudeRef", "label": "GPS Latitude Ref", 
...         "raw": { "_content": "North" } },
...       { "tagspace": "GPS", "tagspaceid": 0, "tag": "GPSLatitude", "label": "GPS Latitude", 
...         "raw": { "_content": "39 deg 56' 44.40\"" }, 
...         "clean": { "_content": "39 deg 56' 44.40\" N" } },
...       { "tagspace": "GPS", "tagspaceid": 0, "tag": "GPSLongitudeRef", "label": "GPS Longitude Ref", 
...         "raw": { "_content": "East" } },
...       { "tagspace": "GPS", "tagspaceid": 0, "tag": "GPSLongitude", "label": "GPS Longitude", 
...         "raw": { "_content": "116 deg 16' 10.20\"" }, 
...         "clean": { "_content": "116 deg 16' 10.20\" E" } }
...     ] }, "stat": "ok" })"""
>>> quoted_resp = re.sub('deg ([^"]+)"', r'deg \1\\"', response[14:-1])

That quoted response can then be used in a call to json.loads and you can then easily access the required data in the newly generated dictionary structure:

>>> photodict = json.loads(quoted_resp)
>>> for meta in photodict['photo']['exif']:                                                                                                               
...     if meta["tagspace"] == "GPS" and meta["tag"] == "GPSLongitude":
...         print(meta["clean"]["_content"])
... 
116 deg 16' 10.20" E

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM