简体   繁体   中英

Converting deeply nested JSON response from an API call to pandas dataframe

I am currently having trouble parsing a deeply nested JSON response from a HTTP API call.

My JSON Response is like

{'took': 476,
 '_revision': 'r08badf3',
 'response': {'accounts': {'hits': [{'name': '4002238760',
     'display_name': 'Googleglass-4002238760',
     'selected_fields': ['Googleglass',
      'DDMonkey',
      'Papu New Guinea',
      'Jonathan Vardharajan',
      '4002238760',
      'DDMadarchod-INSTE',
      None,
      'Googleglass',
      '0001012556',
      'CC',
      'Setu Non Standard',
      '40022387',
      320142,
      4651321321333,
      1324650651651]},
    {'name': '4003893720',
     'display_name': 'Swift-4003893720',
     'selected_fields': ['Swift',
      'DDMonkey',
      'Papu New Guinea',
      'Jonathan Vardharajan',
      '4003893720',
      'DDMadarchod-UPTM-RemotexNBD',
      None,
      'S.W.I.F.T. SCRL',
      '0001000110',
      'SE',
      'Setu Non Standard',
      '40038937',
      189508,
      1464739200000,
      1559260800000]},

After I receive the response I am storing it in data object using json normalize

data = response.json()
data = data['response']['accounts']['hits']
data = json_normalize(data)

However after I normalize my dataframe looks like this

My Curl Statement looks like this

curl --data 'query= {"terms":[{"type":"string_attribute","attribute":"Account Type","query_term_id":"account_type","in_list":["Contract"]},{"type":"string","term":"status_group","in_list":["paying"]},{"type":"string_attribute","attribute":"Region","in_list":["DDEU"]},{"type":"string_attribute","attribute":"Country","in_list":["Belgium"]},{"type":"string_attribute","attribute":"CSM Tag","in_list":["EU CSM"]},{"type":"date_attribute","attribute":"Contract Renewal Date","gte":1554057000000,"lte":1561833000000}],"count":1000,"offset":0,"fields":[{"type":"string_attribute","attribute":"DomainName","field_display_name":"Client Name"},{"type":"string_attribute","attribute":"Region","field_display_name":"Region"},{"type":"string_attribute","attribute":"Country","field_display_name":"Country"},{"type":"string_attribute","attribute":"Success Manager","field_display_name":"Client Success Manager"},{"type":"string","term":"identifier","field_display_name":"Account id"},{"type":"string_attribute","attribute":"DeviceSLA","field_display_name":"[FIN] Material Part Number"},{"type":"string_attribute","attribute":"SFDCAccountId","field_display_name":"SFDCAccountId"},{"type":"string_attribute","attribute":"Client","field_display_name":"[FIN] Client Sold-To Name"},{"type":"string_attribute","attribute":"Sold To Code","field_display_name":"[FIN] Client Sold To Code"},{"type":"string_attribute","attribute":"BU","field_display_name":"[FIN] Active BUs"},{"type":"string_attribute","attribute":"Service Type","field_display_name":"[FIN] Service Type"},{"type":"string_attribute","attribute":"Contract Header ID","field_display_name":"[FIN] SAP Contract Header ID"},{"type":"number_attribute","attribute":"Contract Value","field_display_name":"[FIN] ACV - Annual Contract Value","desc":true},{"type":"date_attribute","attribute":"Contract Start Date","field_display_name":"[FIN] Contract Start Date"},{"type":"date_attribute","attribute":"Contract Renewal Date","field_display_name":"[FIN] Contract Renewal Date"}],"scope":"all"}' --header 'app-token:YOUR-TOKEN-HERE' 'https://app.totango.com/api/v1/search/accounts'

So ultimately I want to store the Response in a dataframe along with the field names.

I've had to do this sort of thing a few times in the past (flatten out a nested json) I'll explain my process, and you can see if it works, or at least can then work the code a bit to fit your needs.

1) Took the data response, and completely flattened it out using a function. This blog was very helpful when I first had to do this.

2) Then it iterates through the flat dictionary created to find where each rows and columns are needed to be created by the numbering of the new key names within the nested parts. There are also keys that are unique/distinct, so they don't have a number to identify as a "new" row, so I account for those in what I called special_cols .

3) As it iterates through those, pulls the specified row number (embedded in those flat keys), and then constructs the dataframe in that way.

It sounds complicated, but if you debug and run line by line, you could see how it works. None-the-less, I believe it should get you what you need.

data = {'took': 476,
 '_revision': 'r08badf3',
 'response': {'accounts': {'hits': [{'name': '4002238760',
     'display_name': 'Googleglass-4002238760',
     'selected_fields': ['Googleglass',
      'DDMonkey',
      'Papu New Guinea',
      'Jonathan Vardharajan',
      '4002238760',
      'DDMadarchod-INSTE',
      None,
      'Googleglass',
      '0001012556',
      'CC',
      'Setu Non Standard',
      '40022387',
      320142,
      4651321321333,
      1324650651651]},
    {'name': '4003893720',
     'display_name': 'Swift-4003893720',
     'selected_fields': ['Swift',
      'DDMonkey',
      'Papu New Guinea',
      'Jonathan Vardharajan',
      '4003893720',
      'DDMadarchod-UPTM-RemotexNBD',
      None,
      'S.W.I.F.T. SCRL',
      '0001000110',
      'SE',
      'Setu Non Standard',
      '40038937',
      189508,
      1464739200000,
      1559260800000]}]}}}


import pandas as pd
import re


def flatten_json(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

flat = flatten_json(data)                      


results = pd.DataFrame()
special_cols = []

columns_list = list(flat.keys())
for item in columns_list:
    try:
        row_idx = re.findall(r'\_(\d+)\_', item )[0]
    except:
        special_cols.append(item)
        continue
    column = re.findall(r'\_\d+\_(.*)', item )[0]
    column = column.replace('_', '')

    row_idx = int(row_idx)
    value = flat[item]

    results.loc[row_idx, column] = value

for item in special_cols:
    results[item] = flat[item]

Output:

print (results.to_string())
         name             displayname selectedfields0 selectedfields1  selectedfields2       selectedfields3 selectedfields4              selectedfields5  selectedfields6  selectedfields7 selectedfields8 selectedfields9   selectedfields10 selectedfields11  selectedfields12  selectedfields13  selectedfields14  took _revision
0  4002238760  Googleglass-4002238760     Googleglass        DDMonkey  Papu New Guinea  Jonathan Vardharajan      4002238760            DDMadarchod-INSTE              NaN      Googleglass      0001012556              CC  Setu Non Standard         40022387          320142.0      4.651321e+12      1.324651e+12   476  r08badf3
1  4003893720        Swift-4003893720           Swift        DDMonkey  Papu New Guinea  Jonathan Vardharajan      4003893720  DDMadarchod-UPTM-RemotexNBD              NaN  S.W.I.F.T. SCRL      0001000110              SE  Setu Non Standard         40038937          189508.0      1.464739e+12      1.559261e+12   476  r08badf3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM