简体   繁体   中英

Extract values from JSON nested list and string array with Python

I am trying to pull the coordinates from multiple neighborhoods in Boston, MA from a JSON dataset but am stuck trying to get just the first coordinate pair for each city; below is a small version of the Roslindale coordinates.

"features": [{
    "type": "Feature",
    "properties": {
      "Name": "Roslindale",
      "Acres": 1605.5682375,
      "SqMiles": 2.51,
    },
    "geometry": {
      "type": "MultiPolygon",
      "coordinates": [
        [
          [
            [
              -71.125927174853857,
              42.272013107957406
            ],
            [
              -71.125927174853857,
              42.272013107957406
            ]
          ]
        ],
        [
          [
            [
              -71.125830766767592,
              42.272212845889705
            ],
            [
              -71.125830766767592,
              42.272212845889705
            ]
          ]
        ],
        [
          [
            [
              -71.125767203228904,
              42.272315958536389
            ],
            [
              -71.125767203228904,
              42.272315958536389
            ]
          ]
        ]
      ]
    }
  },

Right now I have pulled the data i want using

for data in boston_neighborhoods:
    neighborhood_name = data['properties']['Name']
    neighborhood_id = data['properties']['Neighborhood_ID']
    neighborhood_size = data['properties']['SqMiles']
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon
    neighborhood_lon = neighborhood_latlon

    neighborhoods = neighborhoods.append({'Neighborhood': neighborhood_name,
                                          'Neighborhood_ID': neighborhood_id,
                                          'SqMiles': neighborhood_size,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

This returns multiple coordinate pairs, but i only want the first pair, below is example output of what I am now returning:

Latitude                   |           Longitude     
--------------------------------------------------------
[[[[-71.12592717485386,    |    [[[[-71.12592717485386, 
42.272013107957406], [...  |    42.272013107957406], [...    

Might be overkill, but JMESPath makes it really easy to query nested JSON structures like that one.

Traversing down the document, you first need to get every element in the array ( [*] ), then for each element you'll select items into an object (a Python dictionary). You'll select the neighborhood under properties and then Name ( properties.Name ). You do the same for similarly nested properties.

Coordinates live under geometry.coordinates which is an array of arrays of arrays of coordinate pairs.

import jmespath
import pandas as pd

query = """
[*].{ 
    Neighborhood: properties.Name,
    Neighborhood_ID: properties.Neighborhood_ID, 
    SqMiles: properties.SqMiles, 
    Latitude: geometry.coordinates[0][0][0][0], 
    Longitude: geometry.coordinates[0][0][0][1] 
}
"""

compiled = jmespath.compile(query)
result = compiled.search(boston_neighborhoods)

df = pd.DataFrame.from_records(result)
#   Neighborhood Neighborhood_ID  SqMiles   Latitude  Longitude
# 0   Roslindale            None     2.51 -71.125927  42.272013

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM