简体   繁体   English

创建一种简单的方法来读取保存在复杂的嵌套python字典中的JSON对象

[英]Creating an easy way to read a JSON object saved in a complex, nested python dictionary

I have a JSON object being pulled from an API. 我有一个从API中提取的JSON对象。 I pulled the JSON data into a python dictionary. 我将JSON数据提取到python字典中。 I'm now finding it difficult to extract data from the dictionary because it is nested and has sublists and sub-dictionaries. 我现在发现很难从字典中提取数据,因为它是嵌套的并且具有子列表和子字典。 To better understand the nature of the data pulled, I tried the below: 为了更好地了解提取的数据的性质,我尝试了以下操作:

url = "https://api.xyz.com/v11/api.json?KEY=abc&LOOKUP=bbb"
response = requests.get(url)
data = response.json()
print (type(data))
print(data.keys())
print (type(data['Results']))
print (len(data['Results']))
print (type(data['Results'][0]))
print (data['Results'][0].keys())
print (type(data['Results'][0]['Result']))
print ((data['Results'][0]['Result'].keys()))
print (type((data['Results'][0]['Result']['Paths'])))
print (len((data['Results'][0]['Result']['Paths'])))
print (type((data['Results'][0]['Result']['Paths'][0])))
print ((data['Results'][0]['Result']['Paths'][0].keys()))
print (type(data['Results'][0]['Result']['Paths'][0]['Technologies']))
print (len(data['Results'][0]['Result']['Paths'][0]['Technologies']))
print ((data['Results'][0]['Result']['Paths'][0]['Technologies'][8].keys()))
print (data['Results'][0]['Result']['Paths'][0]['Technologies'][8]['Tag'])

From the above, I got the following output: 从上面,我得到以下输出:

<class 'dict'>
dict_keys(['Results', 'Errors'])
<class 'list'>
1
<class 'dict'>
dict_keys(['Lookup', 'LastIndexed', 'FirstIndexed', 'Meta', 'Result'])
<class 'dict'>
dict_keys(['Paths', 'IsDB', 'Spend'])
<class 'list'>
9
<class 'dict'>
dict_keys(['LastIndexed', 'Technologies', 'Domain', 'SubDomain', 'FirstIndexed', 'Url'])
<class 'list'>
77
dict_keys(['FirstDetected', 'Name', 'LastDetected', 'Categories', 'Description', 'IsPremium', 'Tag', 'Link'])
cdn

From other iterations of this, I know that depending on the list item I choose after 'Paths', I can get a varying list length for 'Technologies' ranging from 5 -100. 从其他迭代中,我知道根据“路径”之后选择的列表项,“技术”的列表长度可以在5 -100之间变化。 I'm specifically interested in getting a list of all technologies for which the 'Tag' == A. I want to be able to create a table with all the upper level information for all entries that have the 'Tag' == A. Ideally, I want to get this info in a CSV file. 我对获取“标签” == A的所有技术的列表特别感兴趣。我希望能够创建一个表,其中包含“标签” == A的所有条目的所有上层信息。理想情况下,我想在CSV文件中获取此信息。 I've looked at Pandas dataframe from nested dictionary and Create a dictionary with list comprehension in Python and Construct pandas DataFrame from items in nested dictionary but get confused when it comes to accessing the list (specially after 'Paths'). 我已经看过嵌套字典中的Pandas数据帧,在Python中创建了具有列表理解的字典,从嵌套字典中的项构造了pandas DataFrame,但是在访问列表时(特别是在“路径”之后)感到困惑。

So far, the code I have is a simple data dump into a CSV which is not useful at all since all of the data goes into one cell and is not at all usable. 到目前为止,我拥有的代码只是将数据简单地转储到CSV中,因为所有数据都进入一个单元格并且根本无法使用,所以根本没有用。

So I figured out a way to do this. 所以我想出了一种方法。 Code below. 下面的代码。

import json
import requests
import pandas

domaindict = {}
tech_info = {}
tag = 'mobile'

'''For every domain, pulls the api response, stores in dictionary, and gets dataframe
with technology info by calling the get_paths and set_df functions'''

def get_info(d):
    url = "https://api.xyz.com/v11/api.json?KEY=abc&LOOKUP={}".format(d)
    response = requests.get(url)
    data = response.json()
    domaindict[d] = data
    paths = get_paths(d)
    final_info = set_df(paths)
    return final_info

'''Gets the list of paths for a domain'''        
def get_paths(d):
    paths = domaindict[d]['Results'][0]['Result']['Paths']
    return paths

'''Sets up dataframe to get info at the technology level. Loops through the technology list and path lists.'''
def set_df(paths):
    df_m = pandas.DataFrame({'Domain':[],'Url':[],'Subdomain':[],'Technology Name':[],'Technology Description':[],'Technology Tag':[]})
    for path in paths:
        domain_name = path['Domain']
        url = path['Url']
        subdomain = path['SubDomain']
        techs = path['Technologies']
        for tech in techs: 
            tech_name = tech['Name']
            tech_desc = tech['Description']
            tech_tag = tech['Tag']
            df =  pandas.DataFrame({'Domain':[domain_name], 'Url':[url], 'Subdomain':[subdomain], 
                                    'Technology Name':[tech_name],
                                     'Technology Description': [tech_desc],
                                     'Technology Tag': [tech_tag]})
            df_m = df_m.append(df)
        return df_m


'''loops through the csv file with list of domain names, calls the get_info function and saves dataframe with technology info for
every domain in one file'''

read_domains = pandas.read_excel('domain.xlsx', header = None)
df_f = pandas.DataFrame()
for d in read_domains.values: 
    print(d[0])
    df_i = get_info(d[0])
    df_f = df_f.append(df_i)


with pandas.ExcelWriter('mobile.xlsx') as w:
    df_f.to_excel(w,'mobile')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM