[英]Export Nested JSON to CSV using Python
I have the following JSON script which i got from Xero. 我有以下从Xero获得的JSON脚本。 It is a nested JSON script and im trying to create a flat table and then export it to CSV. 这是一个嵌套的JSON脚本,我试图创建一个平面表,然后将其导出为CSV。
I have written this python code but im struggling to flatten the nested JSON script. 我已经编写了此python代码,但是我在拼合嵌套JSON脚本方面很费力。 Initially i get the the data from Xero and i use the json.dumps so as to serialise the datetime. 最初,我从Xero获取数据,并使用json.dumps来序列化日期时间。 The JSON export which is displayed here comes from Postman software. 此处显示的JSON导出来自Postman软件。 When i get the JSON script using python the date format is the following 'UpdatedDateUTC': datetime.datetime(2018, 10, 24, 12, 53, 55, 930000) . 当我使用python获取JSON脚本时,日期格式为以下'UpdatedDateUTC':datetime.datetime(2018,10,24,12,53,55,930000) 。 So i use json.dumps so as to serialise it. 所以我使用json.dumps来序列化它。
When i produce the first export: 当我进行首次出口时:
df = pd.read_json(b_str)
df.to_csv(path+'invoices.csv')
The CSV file looks like this: CSV文件如下所示:
The next step is to flatten the Contact and CreditNotes columns and make them part of the main table. 下一步是将Contact和CreditNotes列弄平,并使它们成为主表的一部分。 So instead of the Contact column will have 8 new columns: ContactID, ContactNumber, Name, Addresses, Phones, ContactGroups, ContactPersons, HasValidationErrors . 因此,而不是Contact列将具有8个新列: ContactID,ContactNumber,名称,地址,电话,ContactGroups,ContactPersons,HasValidationErrors 。 Similar process for CreditNotes column CreditNotes列的类似过程
Im trying to replicate the methodology on this link but with no luck. 我试图在此链接上复制方法,但没有运气。 I get an export which looks like this. 我得到一个看起来像这样的出口。 The contacts_with_id dataframe is shown on multiple rows and not multiple columns. contact_with_id数据框显示在多行而不是多列上。 I cant figure out what i am doing wrong. 我不知道我在做什么错。
I have also used the flatten_json function but with no luck either. 我也使用了flatten_json函数,但也没有运气。
I dont really need to make this methodology work. 我真的不需要使这种方法有效。 I just want to find a way to export the nested json script to a readable csv file. 我只想找到一种将嵌套的json脚本导出到可读的csv文件的方法。
Python Code: Python代码:
from xero import Xero
from xero.auth import PrivateCredentials
with open("E:\\privatekey.pem") as keyfile:
rsa_key = keyfile.read()
credentials = PrivateCredentials('BHK1ZBEKIL4WM0BLKLIOT65PSIA43N', rsa_key)
xero = Xero(credentials)
import json
import pandas as pd
from pandas.io.json import json_normalize #package for flattening json in pandas df
# The following is a list
a_list = xero.invoices.all()
# The following is a string. Serialised Datetime
b_str = json.dumps(a_list, default=str)
path='E:\\MyDrive\\Python Workspaces\\'
df = pd.read_json(b_str)
df.to_csv(path+'invoices.csv')
# ********************* FLATTEN JSON *****************
dd = json.loads(b_str)
contacts_with_id = pd.io.json.json_normalize(dd, record_path='Contact', meta='InvoiceID',
record_prefix='Contact.')
df_final = pd.merge(contacts_with_id, df, how='inner', on='InvoiceID')
df_final.to_csv(path+'invoices_final.csv')
Json Script Below: 下面的Json脚本:
{
"Id": "568d1686-7c53-4f22-a93f-754589a246a7",
"Status": "OK",
"ProviderName": "Rest API",
"DateTimeUTC": "/Date(1552234854959)/",
"Invoices": [
{
"Type": "ACCPAY",
"InvoiceID": "8289ab9d-2134-4601-8622-e7fdae4b6d89",
"InvoiceNumber": "10522",
"Reference": "10522",
"Payments": [],
"CreditNotes": [],
"Prepayments": [],
"Overpayments": [],
"AmountDue": 102,
"AmountPaid": 0,
"AmountCredited": 0,
"CurrencyRate": 1,
"HasErrors": false,
"IsDiscounted": false,
"HasAttachments": false,
"Contact": {
"ContactID": "d1dba397-0f0b-4819-a6ce-2839b7be5008",
"ContactNumber": "c03bbcb5-fb0b-4f46-83f0-8687f754488b",
"Name": "Micro",
"Addresses": [],
"Phones": [],
"ContactGroups": [],
"ContactPersons": [],
"HasValidationErrors": false
},
"DateString": "2017-02-06T00:00:00",
"Date": "/Date(1486339200000+0000)/",
"DueDateString": "2017-03-08T00:00:00",
"DueDate": "/Date(1488931200000+0000)/",
"Status": "AUTHORISED",
"LineAmountTypes": "Exclusive",
"LineItems": [],
"SubTotal": 85,
"TotalTax": 17,
"Total": 102,
"UpdatedDateUTC": "/Date(1529940362110+0000)/",
"CurrencyCode": "GBP"
},
{
"Type": "ACCREC",
"InvoiceID": "9e37150f-88a5-4213-a085-b30c5e01c2bf",
"InvoiceNumber": "(13)",
"Reference": "",
"Payments": [],
"CreditNotes": [
{
"CreditNoteID": "3c5c7dec-534a-46e0-ad1b-f0f69822cfd5",
"CreditNoteNumber": "(12)",
"ID": "3c5c7dec-534a-46e0-ad1b-f0f69822cfd5",
"AppliedAmount": 1200,
"DateString": "2011-05-04T00:00:00",
"Date": "/Date(1304467200000+0000)/",
"LineItems": [],
"Total": 7800
},
{
"CreditNoteID": "af38e37f-4ba3-4208-a193-a32b418c2bbc",
"CreditNoteNumber": "(14)",
"ID": "af38e37f-4ba3-4208-a193-a32b418c2bbc",
"AppliedAmount": 2600,
"DateString": "2011-05-04T00:00:00",
"Date": "/Date(1304467200000+0000)/",
"LineItems": [],
"Total": 2600
}
],
"Prepayments": [],
"Overpayments": [],
"AmountDue": 0,
"AmountPaid": 0,
"AmountCredited": 3800,
"CurrencyRate": 1,
"HasErrors": false,
"IsDiscounted": false,
"HasAttachments": false,
"Contact": {
"ContactID": "58164bd6-5225-4f30-ad89-35140db5b624",
"ContactNumber": "d0b420b8-4a58-40d1-9717-8525edda7658",
"Name": "FSales (1)",
"Addresses": [],
"Phones": [],
"ContactGroups": [],
"ContactPersons": [],
"HasValidationErrors": false
},
"DateString": "2011-05-04T00:00:00",
"Date": "/Date(1304467200000+0000)/",
"DueDateString": "2011-06-03T00:00:00",
"DueDate": "/Date(1307059200000+0000)/",
"Status": "PAID",
"LineAmountTypes": "Exclusive",
"LineItems": [],
"SubTotal": 3166.67,
"TotalTax": 633.33,
"Total": 3800,
"UpdatedDateUTC": "/Date(1529943661150+0000)/",
"CurrencyCode": "GBP",
"FullyPaidOnDate": "/Date(1304467200000+0000)/"
},
{
"Type": "ACCPAY",
"InvoiceID": "1ddea7ec-a0d5-457a-a8fd-cfcdc2099d51",
"InvoiceNumber": "01596057543",
"Reference": "",
"Payments": [
{
"PaymentID": "fd639da3-c009-47df-a4bf-98ccd5c68e43",
"Date": "/Date(1551657600000+0000)/",
"Amount": 173.86,
"Reference": "",
"CurrencyRate": 1,
"HasAccount": false,
"HasValidationErrors": false
}
],
"CreditNotes": [],
"Prepayments": [],
"Overpayments": [],
"AmountDue": 0,
"AmountPaid": 173.86,
"AmountCredited": 0,
"CurrencyRate": 1,
"HasErrors": false,
"IsDiscounted": false,
"HasAttachments": true,
"Contact": {
"ContactID": "309afb74-0a3b-4d68-85e8-2259ca5acd13",
"ContactNumber": "91eef1f0-5fe6-45d7-b739-1ab5352a5523",
"Name": "Company AAA",
"Addresses": [],
"Phones": [],
"ContactGroups": [],
"ContactPersons": [],
"HasValidationErrors": false
},
"DateString": "2019-02-23T00:00:00",
"Date": "/Date(1550880000000+0000)/",
"DueDateString": "2019-03-21T00:00:00",
"DueDate": "/Date(1553126400000+0000)/",
"Status": "PAID",
"LineAmountTypes": "Exclusive",
"LineItems": [],
"SubTotal": 144.88,
"TotalTax": 28.98,
"Total": 173.86,
"UpdatedDateUTC": "/Date(1551777481907+0000)/",
"CurrencyCode": "GBP",
"FullyPaidOnDate": "/Date(1551657600000+0000)/"
},
{
"Type": "ACCPAY",
"InvoiceID": "ba5ff3b1-1058-4645-80da-5475c23da949",
"InvoiceNumber": "Q0603",
"Reference": "",
"Payments": [],
"CreditNotes": [],
"Prepayments": [],
"Overpayments": [],
"AmountDue": 213.24,
"AmountPaid": 0,
"AmountCredited": 0,
"CurrencyRate": 1,
"HasErrors": false,
"IsDiscounted": false,
"HasAttachments": true,
"Contact": {
"ContactID": "f0473b41-da92-4397-9d2c-741812f2475c",
"ContactNumber": "1f124969-de8d-40b8-8140-d4997511b0dc",
"Name": "BTelcom",
"Addresses": [],
"Phones": [],
"ContactGroups": [],
"ContactPersons": [],
"HasValidationErrors": false
},
"DateString": "2019-03-05T00:00:00",
"Date": "/Date(1551744000000+0000)/",
"DueDateString": "2019-03-21T00:00:00",
"DueDate": "/Date(1553126400000+0000)/",
"Status": "SUBMITTED",
"LineAmountTypes": "Exclusive",
"LineItems": [],
"SubTotal": 177.7,
"TotalTax": 35.54,
"Total": 213.24,
"UpdatedDateUTC": "/Date(1552068778417+0000)/",
"CurrencyCode": "GBP"
}
]
} }
I've had to do something like this before: 我之前必须做这样的事情:
Basically flattened out the entire nested json, then iterate through those columns (which uses a pattern to include which row it would be constructed into a table) to create the new rows. 基本上将整个嵌套的json弄平,然后遍历这些列(使用一种模式来将其构造成表格的行包括在内)以创建新行。
There are 4 invoices, and this creates 4 rows (for each of the invoices). 有4张发票,这将创建4行(对于每个发票)。 Hopefully this is what you are looking for. 希望这是您想要的。
NOTE Where you might run into some issues: 注意您可能会遇到的一些问题:
One of the things to consider if trying to flatten out a json file where there is nested lists, and the nested lists are of different lengths, anytime a single row has ONE value for any given column, it has to create that column even if all the other rows are null. 如果尝试展平有嵌套列表且嵌套列表具有不同长度的json文件,则每当一行对任何给定列具有ONE值时,都必须创建该列,即使所有其他行为空。 In that Payments
Key, there are lists with additional 7 elements. 在该Payments
键中,有包含其他7个元素的列表。 So if there are 8 payments for some IDs (as opposed to all the others only having 1 payment), it'll have to create 56 additional columns to store all those in separate columns / flat file. 因此,如果某些ID有8笔付款(而其他所有ID只有1笔付款),则必须另外创建56列以将所有ID存储在单独的列/平面文件中。
jsonStr = '''{
"Id": "568d1686-7c53-4f22-a93f-754589a246a7",
"Status": "OK",
"ProviderName": "Rest API",
"DateTimeUTC": "/Date(1552234854959)/",
"Invoices": [
{
"Type": "ACCPAY",
"InvoiceID": "8289ab9d-2134-4601-8622-e7fdae4b6d89",
"InvoiceNumber": "10522",
"Reference": "10522",
"Payments": [],
"CreditNotes": [],
"Prepayments": [],
"Overpayments": [],
"AmountDue": 102,
"AmountPaid": 0,
"AmountCredited": 0,
"CurrencyRate": 1,
"HasErrors": false,
"IsDiscounted": false,
"HasAttachments": false,
"Contact": {
"ContactID": "d1dba397-0f0b-4819-a6ce-2839b7be5008",
"ContactNumber": "c03bbcb5-fb0b-4f46-83f0-8687f754488b",
"Name": "Micro",
"Addresses": [],
"Phones": [],
"ContactGroups": [],
"ContactPersons": [],
"HasValidationErrors": false
},
"DateString": "2017-02-06T00:00:00",
"Date": "/Date(1486339200000+0000)/",
"DueDateString": "2017-03-08T00:00:00",
"DueDate": "/Date(1488931200000+0000)/",
"Status": "AUTHORISED",
"LineAmountTypes": "Exclusive",
"LineItems": [],
"SubTotal": 85,
"TotalTax": 17,
"Total": 102,
"UpdatedDateUTC": "/Date(1529940362110+0000)/",
"CurrencyCode": "GBP"
},
{
"Type": "ACCREC",
"InvoiceID": "9e37150f-88a5-4213-a085-b30c5e01c2bf",
"InvoiceNumber": "(13)",
"Reference": "",
"Payments": [],
"CreditNotes": [
{
"CreditNoteID": "3c5c7dec-534a-46e0-ad1b-f0f69822cfd5",
"CreditNoteNumber": "(12)",
"ID": "3c5c7dec-534a-46e0-ad1b-f0f69822cfd5",
"AppliedAmount": 1200,
"DateString": "2011-05-04T00:00:00",
"Date": "/Date(1304467200000+0000)/",
"LineItems": [],
"Total": 7800
},
{
"CreditNoteID": "af38e37f-4ba3-4208-a193-a32b418c2bbc",
"CreditNoteNumber": "(14)",
"ID": "af38e37f-4ba3-4208-a193-a32b418c2bbc",
"AppliedAmount": 2600,
"DateString": "2011-05-04T00:00:00",
"Date": "/Date(1304467200000+0000)/",
"LineItems": [],
"Total": 2600
}
],
"Prepayments": [],
"Overpayments": [],
"AmountDue": 0,
"AmountPaid": 0,
"AmountCredited": 3800,
"CurrencyRate": 1,
"HasErrors": false,
"IsDiscounted": false,
"HasAttachments": false,
"Contact": {
"ContactID": "58164bd6-5225-4f30-ad89-35140db5b624",
"ContactNumber": "d0b420b8-4a58-40d1-9717-8525edda7658",
"Name": "FSales (1)",
"Addresses": [],
"Phones": [],
"ContactGroups": [],
"ContactPersons": [],
"HasValidationErrors": false
},
"DateString": "2011-05-04T00:00:00",
"Date": "/Date(1304467200000+0000)/",
"DueDateString": "2011-06-03T00:00:00",
"DueDate": "/Date(1307059200000+0000)/",
"Status": "PAID",
"LineAmountTypes": "Exclusive",
"LineItems": [],
"SubTotal": 3166.67,
"TotalTax": 633.33,
"Total": 3800,
"UpdatedDateUTC": "/Date(1529943661150+0000)/",
"CurrencyCode": "GBP",
"FullyPaidOnDate": "/Date(1304467200000+0000)/"
},
{
"Type": "ACCPAY",
"InvoiceID": "1ddea7ec-a0d5-457a-a8fd-cfcdc2099d51",
"InvoiceNumber": "01596057543",
"Reference": "",
"Payments": [
{
"PaymentID": "fd639da3-c009-47df-a4bf-98ccd5c68e43",
"Date": "/Date(1551657600000+0000)/",
"Amount": 173.86,
"Reference": "",
"CurrencyRate": 1,
"HasAccount": false,
"HasValidationErrors": false
}
],
"CreditNotes": [],
"Prepayments": [],
"Overpayments": [],
"AmountDue": 0,
"AmountPaid": 173.86,
"AmountCredited": 0,
"CurrencyRate": 1,
"HasErrors": false,
"IsDiscounted": false,
"HasAttachments": true,
"Contact": {
"ContactID": "309afb74-0a3b-4d68-85e8-2259ca5acd13",
"ContactNumber": "91eef1f0-5fe6-45d7-b739-1ab5352a5523",
"Name": "Company AAA",
"Addresses": [],
"Phones": [],
"ContactGroups": [],
"ContactPersons": [],
"HasValidationErrors": false
},
"DateString": "2019-02-23T00:00:00",
"Date": "/Date(1550880000000+0000)/",
"DueDateString": "2019-03-21T00:00:00",
"DueDate": "/Date(1553126400000+0000)/",
"Status": "PAID",
"LineAmountTypes": "Exclusive",
"LineItems": [],
"SubTotal": 144.88,
"TotalTax": 28.98,
"Total": 173.86,
"UpdatedDateUTC": "/Date(1551777481907+0000)/",
"CurrencyCode": "GBP",
"FullyPaidOnDate": "/Date(1551657600000+0000)/"
},
{
"Type": "ACCPAY",
"InvoiceID": "ba5ff3b1-1058-4645-80da-5475c23da949",
"InvoiceNumber": "Q0603",
"Reference": "",
"Payments": [],
"CreditNotes": [],
"Prepayments": [],
"Overpayments": [],
"AmountDue": 213.24,
"AmountPaid": 0,
"AmountCredited": 0,
"CurrencyRate": 1,
"HasErrors": false,
"IsDiscounted": false,
"HasAttachments": true,
"Contact": {
"ContactID": "f0473b41-da92-4397-9d2c-741812f2475c",
"ContactNumber": "1f124969-de8d-40b8-8140-d4997511b0dc",
"Name": "BTelcom",
"Addresses": [],
"Phones": [],
"ContactGroups": [],
"ContactPersons": [],
"HasValidationErrors": false
},
"DateString": "2019-03-05T00:00:00",
"Date": "/Date(1551744000000+0000)/",
"DueDateString": "2019-03-21T00:00:00",
"DueDate": "/Date(1553126400000+0000)/",
"Status": "SUBMITTED",
"LineAmountTypes": "Exclusive",
"LineItems": [],
"SubTotal": 177.7,
"TotalTax": 35.54,
"Total": 213.24,
"UpdatedDateUTC": "/Date(1552068778417+0000)/",
"CurrencyCode": "GBP"
}
]
}'''
import json
import pandas as pd
import re
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
jsonObj = json.loads(jsonStr)
flat = flatten_json(jsonObj)
results = pd.DataFrame()
special_cols = []
columns_list = list(flat.keys())
for item in columns_list:
try:
row_idx = re.findall(r'\_(\d+)\_', item )[0]
except:
special_cols.append(item)
continue
column = re.findall(r'\_\d+\_(.*)', item )[0]
column = column.replace('_', '')
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
for item in special_cols:
results[item] = flat[item]
Output: 输出:
print (results.to_string())
Type InvoiceID InvoiceNumber Reference AmountDue AmountPaid AmountCredited CurrencyRate HasErrors IsDiscounted HasAttachments ContactContactID ContactContactNumber ContactName ContactHasValidationErrors DateString Date DueDateString DueDate Status LineAmountTypes SubTotal TotalTax Total UpdatedDateUTC CurrencyCode CreditNotes0CreditNoteID CreditNotes0CreditNoteNumber CreditNotes0ID CreditNotes0AppliedAmount CreditNotes0DateString CreditNotes0Date CreditNotes0Total CreditNotes1CreditNoteID CreditNotes1CreditNoteNumber CreditNotes1ID CreditNotes1AppliedAmount CreditNotes1DateString CreditNotes1Date CreditNotes1Total FullyPaidOnDate Payments0PaymentID Payments0Date Payments0Amount Payments0Reference Payments0CurrencyRate Payments0HasAccount Payments0HasValidationErrors Id ProviderName DateTimeUTC
0 ACCPAY 8289ab9d-2134-4601-8622-e7fdae4b6d89 10522 10522 102.00 0.00 0.0 1.0 False False False d1dba397-0f0b-4819-a6ce-2839b7be5008 c03bbcb5-fb0b-4f46-83f0-8687f754488b Micro False 2017-02-06T00:00:00 /Date(1486339200000+0000)/ 2017-03-08T00:00:00 /Date(1488931200000+0000)/ OK Exclusive 85.00 17.00 102.00 /Date(1529940362110+0000)/ GBP NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 568d1686-7c53-4f22-a93f-754589a246a7 Rest API /Date(1552234854959)/
1 ACCREC 9e37150f-88a5-4213-a085-b30c5e01c2bf (13) 0.00 0.00 3800.0 1.0 False False False 58164bd6-5225-4f30-ad89-35140db5b624 d0b420b8-4a58-40d1-9717-8525edda7658 FSales (1) False 2011-05-04T00:00:00 /Date(1304467200000+0000)/ 2011-06-03T00:00:00 /Date(1307059200000+0000)/ OK Exclusive 3166.67 633.33 3800.00 /Date(1529943661150+0000)/ GBP 3c5c7dec-534a-46e0-ad1b-f0f69822cfd5 (12) 3c5c7dec-534a-46e0-ad1b-f0f69822cfd5 1200.0 2011-05-04T00:00:00 /Date(1304467200000+0000)/ 7800.0 af38e37f-4ba3-4208-a193-a32b418c2bbc (14) af38e37f-4ba3-4208-a193-a32b418c2bbc 2600.0 2011-05-04T00:00:00 /Date(1304467200000+0000)/ 2600.0 /Date(1304467200000+0000)/ NaN NaN NaN NaN NaN NaN NaN 568d1686-7c53-4f22-a93f-754589a246a7 Rest API /Date(1552234854959)/
2 ACCPAY 1ddea7ec-a0d5-457a-a8fd-cfcdc2099d51 01596057543 0.00 173.86 0.0 1.0 False False True 309afb74-0a3b-4d68-85e8-2259ca5acd13 91eef1f0-5fe6-45d7-b739-1ab5352a5523 Company AAA False 2019-02-23T00:00:00 /Date(1550880000000+0000)/ 2019-03-21T00:00:00 /Date(1553126400000+0000)/ OK Exclusive 144.88 28.98 173.86 /Date(1551777481907+0000)/ GBP NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN /Date(1551657600000+0000)/ fd639da3-c009-47df-a4bf-98ccd5c68e43 /Date(1551657600000+0000)/ 173.86 1.0 False False 568d1686-7c53-4f22-a93f-754589a246a7 Rest API /Date(1552234854959)/
3 ACCPAY ba5ff3b1-1058-4645-80da-5475c23da949 Q0603 213.24 0.00 0.0 1.0 False False True f0473b41-da92-4397-9d2c-741812f2475c 1f124969-de8d-40b8-8140-d4997511b0dc BTelcom False 2019-03-05T00:00:00 /Date(1551744000000+0000)/ 2019-03-21T00:00:00 /Date(1553126400000+0000)/ OK Exclusive 177.70 35.54 213.24 /Date(1552068778417+0000)/ GBP NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 568d1686-7c53-4f22-a93f-754589a246a7 Rest API /Date(1552234854959)/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.