![](/img/trans.png)
[英]Importing and Exporting Nested Dictionary to Excel in an Easy-to-Edit-and-Read Way
[英]Creating an easy way to read a JSON object saved in a complex, nested python dictionary
我有一個從API中提取的JSON對象。 我將JSON數據提取到python字典中。 我現在發現很難從字典中提取數據,因為它是嵌套的並且具有子列表和子字典。 為了更好地了解提取的數據的性質,我嘗試了以下操作:
url = "https://api.xyz.com/v11/api.json?KEY=abc&LOOKUP=bbb"
response = requests.get(url)
data = response.json()
print (type(data))
print(data.keys())
print (type(data['Results']))
print (len(data['Results']))
print (type(data['Results'][0]))
print (data['Results'][0].keys())
print (type(data['Results'][0]['Result']))
print ((data['Results'][0]['Result'].keys()))
print (type((data['Results'][0]['Result']['Paths'])))
print (len((data['Results'][0]['Result']['Paths'])))
print (type((data['Results'][0]['Result']['Paths'][0])))
print ((data['Results'][0]['Result']['Paths'][0].keys()))
print (type(data['Results'][0]['Result']['Paths'][0]['Technologies']))
print (len(data['Results'][0]['Result']['Paths'][0]['Technologies']))
print ((data['Results'][0]['Result']['Paths'][0]['Technologies'][8].keys()))
print (data['Results'][0]['Result']['Paths'][0]['Technologies'][8]['Tag'])
從上面,我得到以下輸出:
<class 'dict'>
dict_keys(['Results', 'Errors'])
<class 'list'>
1
<class 'dict'>
dict_keys(['Lookup', 'LastIndexed', 'FirstIndexed', 'Meta', 'Result'])
<class 'dict'>
dict_keys(['Paths', 'IsDB', 'Spend'])
<class 'list'>
9
<class 'dict'>
dict_keys(['LastIndexed', 'Technologies', 'Domain', 'SubDomain', 'FirstIndexed', 'Url'])
<class 'list'>
77
dict_keys(['FirstDetected', 'Name', 'LastDetected', 'Categories', 'Description', 'IsPremium', 'Tag', 'Link'])
cdn
從其他迭代中,我知道根據“路徑”之后選擇的列表項,“技術”的列表長度可以在5 -100之間變化。 我對獲取“標簽” == A的所有技術的列表特別感興趣。我希望能夠創建一個表,其中包含“標簽” == A的所有條目的所有上層信息。理想情況下,我想在CSV文件中獲取此信息。 我已經看過嵌套字典中的Pandas數據幀,並在Python中創建了具有列表理解的字典,並從嵌套字典中的項構造了pandas DataFrame,但是在訪問列表時(特別是在“路徑”之后)感到困惑。
到目前為止,我擁有的代碼只是將數據簡單地轉儲到CSV中,因為所有數據都進入一個單元格並且根本無法使用,所以根本沒有用。
所以我想出了一種方法。 下面的代碼。
import json
import requests
import pandas
domaindict = {}
tech_info = {}
tag = 'mobile'
'''For every domain, pulls the api response, stores in dictionary, and gets dataframe
with technology info by calling the get_paths and set_df functions'''
def get_info(d):
url = "https://api.xyz.com/v11/api.json?KEY=abc&LOOKUP={}".format(d)
response = requests.get(url)
data = response.json()
domaindict[d] = data
paths = get_paths(d)
final_info = set_df(paths)
return final_info
'''Gets the list of paths for a domain'''
def get_paths(d):
paths = domaindict[d]['Results'][0]['Result']['Paths']
return paths
'''Sets up dataframe to get info at the technology level. Loops through the technology list and path lists.'''
def set_df(paths):
df_m = pandas.DataFrame({'Domain':[],'Url':[],'Subdomain':[],'Technology Name':[],'Technology Description':[],'Technology Tag':[]})
for path in paths:
domain_name = path['Domain']
url = path['Url']
subdomain = path['SubDomain']
techs = path['Technologies']
for tech in techs:
tech_name = tech['Name']
tech_desc = tech['Description']
tech_tag = tech['Tag']
df = pandas.DataFrame({'Domain':[domain_name], 'Url':[url], 'Subdomain':[subdomain],
'Technology Name':[tech_name],
'Technology Description': [tech_desc],
'Technology Tag': [tech_tag]})
df_m = df_m.append(df)
return df_m
'''loops through the csv file with list of domain names, calls the get_info function and saves dataframe with technology info for
every domain in one file'''
read_domains = pandas.read_excel('domain.xlsx', header = None)
df_f = pandas.DataFrame()
for d in read_domains.values:
print(d[0])
df_i = get_info(d[0])
df_f = df_f.append(df_i)
with pandas.ExcelWriter('mobile.xlsx') as w:
df_f.to_excel(w,'mobile')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.