简体   繁体   English

使用 Python 从 JSON 嵌套列表和字符串数组中提取值

[英]Extract values from JSON nested list and string array with Python

I am trying to pull the coordinates from multiple neighborhoods in Boston, MA from a JSON dataset but am stuck trying to get just the first coordinate pair for each city;我正在尝试从 JSON 数据集中提取马萨诸塞州波士顿多个社区的坐标,但我一直试图只获取每个城市的第一个坐标对; below is a small version of the Roslindale coordinates.下面是罗斯林代尔坐标的一个小版本。

"features": [{
    "type": "Feature",
    "properties": {
      "Name": "Roslindale",
      "Acres": 1605.5682375,
      "SqMiles": 2.51,
    },
    "geometry": {
      "type": "MultiPolygon",
      "coordinates": [
        [
          [
            [
              -71.125927174853857,
              42.272013107957406
            ],
            [
              -71.125927174853857,
              42.272013107957406
            ]
          ]
        ],
        [
          [
            [
              -71.125830766767592,
              42.272212845889705
            ],
            [
              -71.125830766767592,
              42.272212845889705
            ]
          ]
        ],
        [
          [
            [
              -71.125767203228904,
              42.272315958536389
            ],
            [
              -71.125767203228904,
              42.272315958536389
            ]
          ]
        ]
      ]
    }
  },

Right now I have pulled the data i want using现在我已经提取了我想要使用的数据

for data in boston_neighborhoods:
    neighborhood_name = data['properties']['Name']
    neighborhood_id = data['properties']['Neighborhood_ID']
    neighborhood_size = data['properties']['SqMiles']
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon
    neighborhood_lon = neighborhood_latlon

    neighborhoods = neighborhoods.append({'Neighborhood': neighborhood_name,
                                          'Neighborhood_ID': neighborhood_id,
                                          'SqMiles': neighborhood_size,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

This returns multiple coordinate pairs, but i only want the first pair, below is example output of what I am now returning:这将返回多个坐标对,但我只想要第一对,下面是我现在返回的示例 output:

Latitude                   |           Longitude     
--------------------------------------------------------
[[[[-71.12592717485386,    |    [[[[-71.12592717485386, 
42.272013107957406], [...  |    42.272013107957406], [...    

Might be overkill, but JMESPath makes it really easy to query nested JSON structures like that one.可能有点矫枉过正,但JMESPath使得查询嵌套的 JSON 结构变得非常容易。

Traversing down the document, you first need to get every element in the array ( [*] ), then for each element you'll select items into an object (a Python dictionary).遍历文档,您首先需要获取数组中的每个元素 ( [*] ),然后对于每个元素,您将 select 项放入 object (ZA7F5F35426B927411FC9231B563821 字典)中。 You'll select the neighborhood under properties and then Name ( properties.Name ).您将 select 在properties下的邻域,然后是Name ( properties.Name )。 You do the same for similarly nested properties.您对类似的嵌套属性执行相同的操作。

Coordinates live under geometry.coordinates which is an array of arrays of arrays of coordinate pairs.坐标位于geometry.coordinates下,它是 arrays 坐标对的 arrays 数组。

import jmespath
import pandas as pd

query = """
[*].{ 
    Neighborhood: properties.Name,
    Neighborhood_ID: properties.Neighborhood_ID, 
    SqMiles: properties.SqMiles, 
    Latitude: geometry.coordinates[0][0][0][0], 
    Longitude: geometry.coordinates[0][0][0][1] 
}
"""

compiled = jmespath.compile(query)
result = compiled.search(boston_neighborhoods)

df = pd.DataFrame.from_records(result)
#   Neighborhood Neighborhood_ID  SqMiles   Latitude  Longitude
# 0   Roslindale            None     2.51 -71.125927  42.272013

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM