[英]How to convert nested JSON like this to a Data-frame? I tried using pandas json_normalize but still doesn't get a proper Data-frame
I am trying to make a DataFrame out of this JSON, It contains three keys which are Header, Column and Rows.我试图用这个 JSON 制作一个 DataFrame,它包含三个键,分别是标题、列和行。 The problem is that the Rows contains a lot of Nesting and even the panda's json_normalize is unable to create a meaningful DataFrame out of this.
问题是 Rows 包含很多嵌套,甚至熊猫的 json_normalize 也无法从中创建有意义的 DataFrame。
Here is the Json:这是Json:
{'Header': {'Time': '2021-10-08T05:08:48-07:00',
'ReportName': 'ProfitAndLoss',
'DateMacro': 'this calendar year-to-date',
'ReportBasis': 'Accrual',
'StartPeriod': '2021-01-01',
'EndPeriod': '2021-10-08',
'SummarizeColumnsBy': 'Total',
'Currency': 'USD',
'Option': [{'Name': 'AccountingStandard', 'Value': 'GAAP'},
{'Name': 'NoReportData', 'Value': 'false'}]},
'Columns': {'Column': [{'ColTitle': '',
'ColType': 'Account',
'MetaData': [{'Name': 'ColKey', 'Value': 'account'}]},
{'ColTitle': 'Total',
'ColType': 'Money',
'MetaData': [{'Name': 'ColKey', 'Value': 'total'}]}]},
'Rows': {'Row': [{'Header': {'ColData': [{'value': 'Income'}, {'value': ''}]},
'Rows': {'Row': [{'ColData': [{'value': 'Design income', 'id': '82'},
{'value': '2250.00'}],
'type': 'Data'},
{'ColData': [{'value': 'Discounts given', 'id': '86'},
{'value': '-89.50'}],
'type': 'Data'},
{'Header': {'ColData': [{'value': 'Landscaping Services', 'id': '45'},
{'value': '1477.50'}]},
'Rows': {'Row': [{'Header': {'ColData': [{'value': 'Job Materials',
'id': '46'},
{'value': ''}]},
'Rows': {'Row': [{'ColData': [{'value': 'Fountains and Garden Lighting',
'id': '48'},
{'value': '2246.50'}],
'type': 'Data'},
{'ColData': [{'value': 'Plants and Soil', 'id': '49'},
{'value': '2351.97'}],
'type': 'Data'},
{'ColData': [{'value': 'Sprinklers and Drip Systems', 'id': '50'},
{'value': '138.00'}],
'type': 'Data'}]},
'Summary': {'ColData': [{'value': 'Total Job Materials'},
{'value': '4736.47'}]},
'type': 'Section'},
{'Header': {'ColData': [{'value': 'Labor', 'id': '51'},
{'value': ''}]},
'Rows': {'Row': [{'ColData': [{'value': 'Installation', 'id': '52'},
{'value': '250.00'}],
'type': 'Data'},
{'ColData': [{'value': 'Maintenance and Repair', 'id': '53'},
{'value': '50.00'}],
'type': 'Data'}]},
'Summary': {'ColData': [{'value': 'Total Labor'},
{'value': '300.00'}]},
'type': 'Section'}]},
'Summary': {'ColData': [{'value': 'Total Landscaping Services'},
{'value': '6513.97'}]},
'type': 'Section'},
{'ColData': [{'value': 'Pest Control Services', 'id': '54'},
{'value': '110.00'}],
'type': 'Data'},
{'ColData': [{'value': 'Sales of Product Income', 'id': '79'},
{'value': '912.75'}],
'type': 'Data'},
{'ColData': [{'value': 'Services', 'id': '1'}, {'value': '503.55'}],
'type': 'Data'}]},
'Summary': {'ColData': [{'value': 'Total Income'}, {'value': '10200.77'}]},
'type': 'Section',
'group': 'Income'},
{'Header': {'ColData': [{'value': 'Cost of Goods Sold'}, {'value': ''}]},
'Rows': {'Row': [{'ColData': [{'value': 'Cost of Goods Sold', 'id': '80'},
{'value': '405.00'}],
'type': 'Data'}]},
'Summary': {'ColData': [{'value': 'Total Cost of Goods Sold'},
{'value': '405.00'}]},
'type': 'Section',
'group': 'COGS'},
{'Summary': {'ColData': [{'value': 'Gross Profit'}, {'value': '9795.77'}]},
'type': 'Section',
'group': 'GrossProfit'},
{'Header': {'ColData': [{'value': 'Expenses'}, {'value': ''}]},
'Rows': {'Row': [{'ColData': [{'value': 'Advertising', 'id': '7'},
{'value': '74.86'}],
'type': 'Data'},
{'Header': {'ColData': [{'value': 'Automobile', 'id': '55'},
{'value': '113.96'}]},
'Rows': {'Row': [{'ColData': [{'value': 'Fuel', 'id': '56'},
{'value': '349.41'}],
'type': 'Data'}]},
'Summary': {'ColData': [{'value': 'Total Automobile'},
{'value': '463.37'}]},
'type': 'Section'},
{'ColData': [{'value': 'Equipment Rental', 'id': '29'},
{'value': '112.00'}],
'type': 'Data'},
{'ColData': [{'value': 'Insurance', 'id': '11'}, {'value': '241.23'}],
'type': 'Data'},
{'Header': {'ColData': [{'value': 'Job Expenses', 'id': '58'},
{'value': '155.07'}]},
'Rows': {'Row': [{'Header': {'ColData': [{'value': 'Job Materials',
'id': '63'},
{'value': ''}]},
'Rows': {'Row': [{'ColData': [{'value': 'Decks and Patios',
'id': '64'},
{'value': '234.04'}],
'type': 'Data'},
{'ColData': [{'value': 'Plants and Soil', 'id': '66'},
{'value': '353.12'}],
'type': 'Data'},
{'ColData': [{'value': 'Sprinklers and Drip Systems', 'id': '67'},
{'value': '215.66'}],
'type': 'Data'}]},
'Summary': {'ColData': [{'value': 'Total Job Materials'},
{'value': '802.82'}]},
'type': 'Section'}]},
'Summary': {'ColData': [{'value': 'Total Job Expenses'},
{'value': '957.89'}]},
'type': 'Section'},
{'Header': {'ColData': [{'value': 'Legal & Professional Fees',
'id': '12'},
{'value': '75.00'}]},
'Rows': {'Row': [{'ColData': [{'value': 'Accounting', 'id': '69'},
{'value': '640.00'}],
'type': 'Data'},
{'ColData': [{'value': 'Bookkeeper', 'id': '70'}, {'value': '55.00'}],
'type': 'Data'},
{'ColData': [{'value': 'Lawyer', 'id': '71'}, {'value': '400.00'}],
'type': 'Data'}]},
'Summary': {'ColData': [{'value': 'Total Legal & Professional Fees'},
{'value': '1170.00'}]},
'type': 'Section'},
{'Header': {'ColData': [{'value': 'Maintenance and Repair', 'id': '72'},
{'value': '185.00'}]},
'Rows': {'Row': [{'ColData': [{'value': 'Equipment Repairs',
'id': '75'},
{'value': '755.00'}],
'type': 'Data'}]},
'Summary': {'ColData': [{'value': 'Total Maintenance and Repair'},
{'value': '940.00'}]},
'type': 'Section'},
{'ColData': [{'value': 'Meals and Entertainment', 'id': '13'},
{'value': '28.49'}],
'type': 'Data'},
{'ColData': [{'value': 'Office Expenses', 'id': '15'},
{'value': '18.08'}],
'type': 'Data'},
{'ColData': [{'value': 'Rent or Lease', 'id': '17'},
{'value': '900.00'}],
'type': 'Data'},
{'Header': {'ColData': [{'value': 'Utilities', 'id': '24'},
{'value': ''}]},
'Rows': {'Row': [{'ColData': [{'value': 'Gas and Electric', 'id': '76'},
{'value': '200.53'}],
'type': 'Data'},
{'ColData': [{'value': 'Telephone', 'id': '77'}, {'value': '130.86'}],
'type': 'Data'}]},
'Summary': {'ColData': [{'value': 'Total Utilities'},
{'value': '331.39'}]},
'type': 'Section'}]},
'Summary': {'ColData': [{'value': 'Total Expenses'},
{'value': '5237.31'}]},
'type': 'Section',
'group': 'Expenses'},
{'Summary': {'ColData': [{'value': 'Net Operating Income'},
{'value': '4558.46'}]},
'type': 'Section',
'group': 'NetOperatingIncome'},
{'Header': {'ColData': [{'value': 'Other Expenses'}, {'value': ''}]},
'Rows': {'Row': [{'ColData': [{'value': 'Miscellaneous', 'id': '14'},
{'value': '2916.00'}],
'type': 'Data'}]},
'Summary': {'ColData': [{'value': 'Total Other Expenses'},
{'value': '2916.00'}]},
'type': 'Section',
'group': 'OtherExpenses'},
{'Summary': {'ColData': [{'value': 'Net Other Income'},
{'value': '-2916.00'}]},
'type': 'Section',
'group': 'NetOtherIncome'},
{'Summary': {'ColData': [{'value': 'Net Income'}, {'value': '1642.46'}]},
'type': 'Section',
'group': 'NetIncome'}]}}
I fetched this Data from Quickbook 'profit and loss' API.我从 Quickbook 的“盈亏”API 中获取了此数据。 The 'Rows' contains a key 'Row' that further contains all the data for DataFrame's row.
'Rows' 包含一个键 'Row',它进一步包含 DataFrame 行的所有数据。 Each branch contains a Header which further contains a value that represents the title of a new column.
每个分支都包含一个 Header,该 Header 进一步包含一个表示新列标题的值。 Any help will be really appreciated.
任何帮助将不胜感激。
I am answering my own question, So this JSON is highly nested and it can't be flattened into a meaningful data-frame using flatten_json or json_normalize, so I have written a script that is specially created for the Quickbook report APIs.我正在回答我自己的问题,因此这个 JSON 是高度嵌套的,无法使用 flatten_json 或 json_normalize 将其展平为有意义的数据帧,因此我编写了一个专门为 Quickbook 报告 API 创建的脚本。 This will take this nested JSON as an argument and create a data frame out of this.
这将把这个嵌套的 JSON 作为参数并从中创建一个数据框。 Any highly nested Quickbook report API will work with this.
任何高度嵌套的 Quickbook 报告 API 都适用于此。
def master(data):
"""
Creates Dataframe using Json received from API
Args:
data(dictionary) : Json response from API
Return:
Dataframe of data inserted
Example:
Dataframe = master(data_dict_or_Json)
"""
level = 0
headingdict = {}
df = {}
maxlvl = []
crlist = []
valuelist = []
colHeaders = []
outdict = []
headingdict['Headers'] = []
current = data['Header']['ReportName']
def printer(l, r=data['Header']['ReportName']):
for i in l:
r += '$' + i
headingdict['Headers'].append(r)
def supreme(Json, valuelist, current, crlist, maxlvl, level):
for i in range(len(Json)):
if 'Header' in Json[i]:
current = Json[i]['Header']['ColData'][0]['value']
new = Json[i]['Rows']['Row']
crlist.append(current)
supreme(new, valuelist, current, crlist, maxlvl, level+1)
current = Json[i]['Header']['ColData'][0]['value']
if 'ColData' in Json[i]:
printer(crlist)
valuelist.append(Json[i]['ColData'])
maxlvl.append(level)
if i == len(Json)-1:
try:
crlist.pop()
except:
pass
return valuelist,level-1
try:
raw_data = data['Rows']['Row']
except:
print('No data Found in {} API'.format(current))
return
supreme(raw_data, valuelist, current, crlist,maxlvl,level)
for i in data["Columns"]["Column"]:
colHeaders.append(i["ColTitle"])
for i in range(len(valuelist)):
for j in range(len(valuelist[i])):
if colHeaders[j] not in df:
df[colHeaders[j]]=[]
df[colHeaders[j]].append(valuelist[i][j]['value'])
responseDf = pd.DataFrame(df)
maxlvl = max(maxlvl)
def seperator(indict,lvl,outdict):
for i in indict:
i = i.split('$')
if len(i) <= lvl:
for j in range((lvl)-len(i)):
i.append(' ')
outdict.append(i)
seperator(headingdict['Headers'],maxlvl+1,outdict)
def heading_lvls(maxlvl,columns=['Form',]):
for i in range(maxlvl):
head = 'Header'+'-'+'{}'.format(i+1)
columns.append(head)
return columns
newdf = pd.DataFrame(outdict,columns = heading_lvls(maxlvl))
result = pd.concat([newdf, responseDf], axis=1)
return result
Try flatten_json.尝试 flatten_json。 It works well with nested json.
它适用于嵌套的 json。 However, your json is quite nested and it's not really suited to a dataframe.
但是,您的 json 是完全嵌套的,它并不真正适合数据框。 set your json = data and run the code below.
设置你的 json = data 并运行下面的代码。 The .T transposes the dataframe.
.T 转置数据帧。 Maybe you can make sense of the data this way.
也许您可以通过这种方式理解数据。 Otherwise you're going to have process the json object first, then create the dataframe.
否则,您将首先处理 json 对象,然后创建数据帧。
from flatten_json import flatten
dic_flattened = (flatten(d, '.') for d in data['Rows']['Row'])
df = pd.DataFrame(dic_flattened)
df.fillna('') ###or ??? df.fillna('').T
Header.ColData.0.value Header.ColData.1.value Rows.Row.0.ColData.0.value Rows.Row.0.ColData.0.id ... Rows.Row.10.Rows.Row.1.type Rows.Row.10.Summary.ColData.0.value Rows.Row.10.Summary.ColData.1.value Rows.Row.10.type
0 Income Design income 82 ... NaN NaN NaN NaN
1 Cost of Goods Sold Cost of Goods Sold 80 ... NaN NaN NaN NaN
2 NaN NaN NaN NaN ... NaN NaN NaN NaN
3 Expenses Advertising 7 ... Data Total Utilities 331.39 Section
4 NaN NaN NaN NaN ... NaN NaN NaN NaN
5 Other Expenses Miscellaneous 14 ... NaN NaN NaN NaN
6 NaN NaN NaN NaN ... NaN NaN NaN NaN
7 NaN NaN NaN NaN ... NaN NaN NaN NaN
[8 rows x 152 columns]
and transposed和换位
0 1 2
3 4 5 6 7
Header.ColData.0.value Income Cost of Goods Sold Expenses Other Expenses
Header.ColData.1.value
Rows.Row.0.ColData.0.value Design income Cost of Goods Sold Advertising Miscellaneous
Rows.Row.0.ColData.0.id 82 80 7 14
Rows.Row.0.ColData.1.value 2250.0 405.0 74.86 2916.0
... ... ... .. ... .. ... .. ..
Rows.Row.10.Rows.Row.1.ColData.1.value 130.86
Rows.Row.10.Rows.Row.1.type Data
Rows.Row.10.Summary.ColData.0.value Total Utilities
Rows.Row.10.Summary.ColData.1.value 331.39
Rows.Row.10.type Section
[152 rows x 8 columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.