简体   繁体   English

在 Python 中展平嵌套的 JSON API 字典

[英]Flattening nested JSON API dictionaries in Python

I am receiving the following json response for a distances matrix which was gathered using the following code:对于使用以下代码收集的距离矩阵,我收到以下 json 响应:

import requests
import json

payload = {
    "origins": [{"latitude": 54.6565153, "longitude": -1.6802816}, {"latitude": 54.6365153, "longitude": -1.6202816}], #surgery
    "destinations": [{"latitude": 54.6856522, "longitude": -1.2183634}, {"latitude": 54.5393295, "longitude": -1.2623914}, {"latitude": 54.5393295, "longitude": -1.2623914}], #oa - up to 625 entries
    "travelMode": "driving",
    "startTime": "2014-04-01T11:59:59+01:00",
    "timeUnit": "second"
}
headers = {"Content-Length": "497", "Content-Type": "application/json"}
paramtr = {"key": "INSERT_KEY_HERE"}
r = requests.post('https://dev.virtualearth.net/REST/v1/Routes/DistanceMatrix', data = json.dumps(payload), params = paramtr, headers = headers)
data = r.json()["resourceSets"][0]["resources"][0]

and am attempting to flatten:并试图压平:

destinations.latitude, destinations.longitude, origins.latitude, origins.longitude, departureTime, destinationIndex, originIndex, totalWalkDuration, travelDistance, travelDuration目的地纬度、目的地经度、起源纬度、起源经度、出发时间、目的地索引、起源索引、总步行时间、旅行距离、旅行持续时间

from:从:

    {'__type': 'DistanceMatrix:http://schemas.microsoft.com/search/local/ws/rest/v1',
 'destinations': [{'latitude': 54.6856522, 'longitude': -1.2183634},
  {'latitude': 54.5393295, 'longitude': -1.2623914},
  {'latitude': 54.5393295, 'longitude': -1.2623914}],
 'errorMessage': 'Request completed.',
 'origins': [{'latitude': 54.6565153, 'longitude': -1.6802816},
  {'latitude': 54.6365153, 'longitude': -1.6202816}],
 'results': [{'departureTime': '/Date(1396349159000-0700)/',
   'destinationIndex': 0,
   'originIndex': 0,
   'totalWalkDuration': 0,
   'travelDistance': 38.209,
   'travelDuration': 3082},
  {'departureTime': '/Date(1396349159000-0700)/',
   'destinationIndex': 1,
   'originIndex': 0,
   'totalWalkDuration': 0,
   'travelDistance': 40.247,
   'travelDuration': 2708},
  {'departureTime': '/Date(1396349159000-0700)/',
   'destinationIndex': 2,
   'originIndex': 0,
   'totalWalkDuration': 0,
   'travelDistance': 40.247,
   'travelDuration': 2708},
  {'departureTime': '/Date(1396349159000-0700)/',
   'destinationIndex': 0,
   'originIndex': 1,
   'totalWalkDuration': 0,
   'travelDistance': 34.857,
   'travelDuration': 2745},
  {'departureTime': '/Date(1396349159000-0700)/',
   'destinationIndex': 1,
   'originIndex': 1,
   'totalWalkDuration': 0,
   'travelDistance': 36.895,
   'travelDuration': 2377},
  {'departureTime': '/Date(1396349159000-0700)/',
   'destinationIndex': 2,
   'originIndex': 1,
   'totalWalkDuration': 0,
   'travelDistance': 36.895,
   'travelDuration': 2377}]}

The best I have currently achieved is:我目前取得的最好成绩是:

json_normalize(outtie, record_path="results", meta="origins")

However this contains nested origins and destinations refuse to append.然而,这包含嵌套的起点和目的地拒绝追加。 I also tried to drop the type to see if it made a difference, and explored max_level= and record_prefix='_' but to no avail.我还尝试删除类型以查看它是否有所不同,并探索了 max_level= 和 record_prefix='_' 但无济于事。

  • I don't not think this is an appropriate question for flatten_json , however, it can be useful for JSON objects that are less thoughtfully constructed.我不认为这对flatten_json是一个合适的问题,但是,它对于构造不那么周到的 JSON 对象可能很有用。
  • The list in destinations , corresponds to the list in results , which means, when they are normalized, they'll have the same index.listdestinations ,对应listresults ,该装置,它们被归时,他们将有相同的索引。
  • The dataframes can be concated correctly, because they will have corresponding indices.数据帧可以正确连接,因为它们将具有相应的索引。
# create a dataframe for results and origins
res_or = pd.json_normalize(data, record_path=['results'], meta=[['origins']])

# create a dataframe for destinations
dest = pd.json_normalize(data, record_path=['destinations'], record_prefix='dest_')

# normalize the origins column in res_or
orig = pd.json_normalize(res_or.origins).rename(columns={'latitude': 'origin_lat', 'longitude': 'origin_long'})

# concat the dataframes
df = pd.concat([res_or, orig, dest], axis=1).drop(columns=['origins'])

# display(df)
                departureTime  destinationIndex  originIndex  totalWalkDuration  travelDistance  travelDuration  origin_lat  origin_long  dest_latitude  dest_longitude
0  /Date(1396349159000-0700)/                 0            0                  0          38.209            3082   54.656515    -1.680282      54.685652       -1.218363
1  /Date(1396349159000-0700)/                 1            0                  0          40.247            2708   54.656515    -1.680282      54.539330       -1.262391
2  /Date(1396349159000-0700)/                 2            0                  0          40.247            2708   54.656515    -1.680282      54.539330       -1.262391

update for new example data更新新的示例数据

  • Records contains the index for destinations and origins , so it's easy to create a separate dataframe for each key, and then .merge the dataframes. Records 包含destinationsorigins的索引,因此很容易为每个键创建单独的数据帧,然后.merge数据帧。
    • The index for orig and dest , corresponds to destinationIndex and originsIndex in results .该指数origdest ,对应于destinationIndexoriginsIndexresults
# create three separate dataframe
results = pd.json_normalize(data, record_path=['results'])
dest = pd.json_normalize(data, record_path=['destinations'], record_prefix='dest_')
orig = pd.json_normalize(data, record_path=['origins'], record_prefix='orig_')

# merge them at the appropriate location
df = pd.merge(results, dest, left_on='destinationIndex', right_index=True)
df = pd.merge(df, orig, left_on='originIndex', right_index=True)

# display(df)
                departureTime  destinationIndex  originIndex  totalWalkDuration  travelDistance  travelDuration  dest_latitude  dest_longitude  orig_latitude  orig_longitude
0  /Date(1396349159000-0700)/                 0            0                  0          38.209            3082      54.685652       -1.218363      54.656515       -1.680282
1  /Date(1396349159000-0700)/                 1            0                  0          40.247            2708      54.539330       -1.262391      54.656515       -1.680282
2  /Date(1396349159000-0700)/                 2            0                  0          40.247            2708      54.539330       -1.262391      54.656515       -1.680282
3  /Date(1396349159000-0700)/                 0            1                  0          34.857            2745      54.685652       -1.218363      54.636515       -1.620282
4  /Date(1396349159000-0700)/                 1            1                  0          36.895            2377      54.539330       -1.262391      54.636515       -1.620282
5  /Date(1396349159000-0700)/                 2            1                  0          36.895            2377      54.539330       -1.262391      54.636515       -1.620282

I encountered something like this before, the best i got is a recursive function that creates a OrderedDict , then i loop through that so here it is.我之前遇到过这样的事情,我得到的最好的是一个递归函数,它创建一个OrderedDict ,然后我循环遍历它,所以它就是这样。

def flatten(data, sep="_"):
    import collections

    obj = collections.OrderedDict()

    def recurse(temp, parent_key=""):
        if isinstance(temp, list):
            for i in range(len(temp)):
                recurse(temp[i], parent_key + sep + str(i) if parent_key else str(i))
        elif isinstance(temp, dict):
            for key, value in temp.items():
                recurse(value, parent_key + sep + key if parent_key else key)
        else:
            obj[parent_key] = temp

    recurse(data)
    return obj

When you loop through it, your data will look something like this当你遍历它时,你的数据看起来像这样

for key, value in flatten(a).items():
    print(key, value)

destinations_0_latitude 54.6856522
destinations_0_longitude -1.2183634
destinations_1_latitude 54.5393295
destinations_1_longitude -1.2623914
destinations_2_latitude 54.5393295
destinations_2_longitude -1.2623914

The reason why i use seperator is, it gives you extendibility, so you can use我使用分隔符的原因是,它为您提供了可扩展性,因此您可以使用

key.split("_")

['destinations', '0', 'latitude'] 54.6856522
['destinations', '0', 'longitude'] -1.2183634

After that you can adapt statements easily, like之后,您可以轻松调整语句,例如

if key.split("_")[2] = "latitude":
    do something...

if key.endswith("latitude"):
    do something...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM