[英]Python write XML data from API to SQL Server (using parse to .csv as intermediate hop)
python 新手在这里。
尝试从 API 中提取数据并插入 SQL 服务器表(现有)以用于 BI 工具。
API 的原始结果是 XML 具有非常不友好的新手格式。
我已经设法将这个(可能不是以最pythonic的方式并接受建议)解析为.csv格式(由于嵌套XML的性质,3个单独的文件)。 现在我有这些 in.csv 我正在尝试将它们写入我的 SQL-Server 表,每个表一个表。csv 但遇到了障碍。 我正在使用此答案中的代码,除了在查询的列名部分中创建的前导逗号之外,一切似乎都很好。 有人帮我删除那个前导逗号吗?
这是我目前编写的代码:
import json
import requests
import pandas as pd
import csv
from pandas.io.json import json_normalize
from datetime import date, timedelta
url = "https://**myAPI_URL.com/Transaction"
paramHeader = '{"Version": "1.0"'
paramHeader += ', "FromDate":"2020-05-01 00:00"'
paramHeader += ', "ToDate": "2020-05-31 00:00"'
paramHeader += ', "MerchantOrgID": null'
paramHeader += ', "CardholderOrgID": null'
paramHeader += ', "CardNumber": null'
paramHeader += ', "DriverID": null'
paramHeader += ', "VehicleID": null'
paramHeader += ', "BillingGroupID": null'
paramHeader += ', "BillingCycleID": null'
paramHeader += ', "EntryMethodID": null'
paramHeader += ', "CardTypeID": null'
paramHeader += ', "TranTypeID": null'
paramHeader += ', "TaxExemptOnly": null}'
headers = {'APIKey': '**myAPI_KEY**'
, 'content-type': 'application/json'
, 'Accept': 'application/json'
, 'parameters': paramHeader}
response = requests.get(url, data='', headers=headers)
if response.status_code == 200:
r = json.loads(response.content.decode('utf-8'))
cleanData = pd.json_normalize(r)
transactionDetails = pd.json_normalize(data=r, record_path='Details', meta=['ID'])
taxes = pd.json_normalize(data=r, record_path=['Details', 'Taxes'],
meta=['ID'])
cleanData.to_csv('**filePath**/mainTransactions.csv')
transactionDetails.to_csv('**filePath**/transactionsDetails.csv')
taxes.to_csv('**filePath/transactionsTaxes.csv')
import pyodbc
connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=**serverIP**;PORT=1433;DATABASE=myDBName;UID=myUserID;PWD=myPWord;')
with open('**filePath**/transactionsDetails.csv', 'r') as f:
reader = csv.reader(f)
columns = next(reader)
query = 'insert into transactionDetails({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * (int(len(columns)))))
print(query) #for debug purposes
cursor = connection.cursor()
for data in reader:
cursor.execute(query, data)
cursor.commit()
此代码导致以下错误:
insert into transactionDetails(,RowNumber,RawProductCode,RawUnitPrice,RawAmount,Quantity,ResolvedProductID,ProductCategoryID,ProductCategory,IsFuel,ProductName,ProductCode,IsTaxProduct,ResolvedUnitPrice,ResolvedAmount,Taxes,ID) values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
Traceback (most recent call last):
File "**workingDirectory**/myProject.py", line 85, in <module>
cursor.execute(query, data)
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near ','. (102) (SQLExecDirectW)")
Process finished with exit code 1
在删除前导逗号后手动执行相同的查询(使用全 1 作为测试数据)会导致成功写入 DB。
myTableName> insert into transactionDetails(RowNumber,RawProductCode,RawUnitPrice,RawAmount,Quantity,ResolvedProductID,ProductCategoryID,ProductCategory,IsFuel,ProductName,ProductCode,IsTaxProduct,ResolvedUnitPrice,ResolvedAmount,Taxes,ID) values (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
[2020-05-28 20:08:55] 1 row affected in 188 ms
谢谢!
只是插入了前导逗号,因为columns
中的第一个元素以某种方式解析为空字符串。 如果这是一致的,您可以通过切片列来解决它:
# Just take the slice starting from the 1st element
# Also, no need to use int(len()), len() already returns an integer.
query = query.format(','.join(columns[1:]), ','.join('?' * len(columns[1:])))
执行上述操作的更简单方法是在第一次获取列时首先执行切片。
columns = next(reader)[1:]
query = 'insert into transactionDetails({0}) values ({0})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
不要通过连接一堆字符串来构造你的paramHeader
。 它非常混乱和危险(容易出现拼写错误)。 The content is supposed to be a json object, so simply make a dictionary and use the json
module to output a properly formatted json object:
>>> import json
>>> my_json = {
... "this": 123,
... "that": "string"
... }
>>> json.dumps(my_json)
'{"this": 123, "that": "string"}'
dumps
代表“转储字符串”。
这是通过从我的 pandas df 使用此处找到的代码段创建的.csv 中删除未命名列来解决的:
with open('**filePath**/transactionsDetails.csv', 'r') as source:
rdr = csv.reader(source)
with open('**filePath**/transactionsDetails2.csv', 'w') as result:
wtr = csv.writer(result)
for r in rdr:
wtr.writerow((r[1], r[3], r[4],r[5],r[6],r[7],r[8],r[9],r[10],r[11],r[12],r[13],r[14],r[15],r[16]))
完整的工作代码如下:
import json
import requests
import pandas as pd
import csv
from pandas.io.json import json_normalize
from datetime import date, timedelta
url = "https://**myAPI**.com/Transaction"
paramHeader = '{"Version": "1.0"'
paramHeader += ', "FromDate":"2020-05-01 00:00"'
paramHeader += ', "ToDate": "2020-05-31 00:00"'
paramHeader += ', "MerchantOrgID": null'
paramHeader += ', "CardholderOrgID": null'
paramHeader += ', "CardNumber": null'
paramHeader += ', "DriverID": null'
paramHeader += ', "VehicleID": null'
paramHeader += ', "BillingGroupID": null'
paramHeader += ', "BillingCycleID": null'
paramHeader += ', "EntryMethodID": null'
paramHeader += ', "CardTypeID": null'
paramHeader += ', "TranTypeID": null'
paramHeader += ', "TaxExemptOnly": null}'
headers = {'APIKey': '**myAPIKey**'
, 'content-type': 'application/json'
, 'Accept': 'application/json'
, 'parameters': paramHeader}
response = requests.get(url, data='', headers=headers)
if response.status_code == 200:
r = json.loads(response.content.decode('utf-8'))
cleanData = pd.json_normalize(r)
transactionDetails = pd.json_normalize(data=r, record_path='Details', meta=['ID'])
#print(transactionDetails)
taxes = pd.json_normalize(data=r, record_path=['Details', 'Taxes'],
meta=['ID'])
cleanData.to_csv('**filePath**/mainTransactions.csv')
transactionDetails.to_csv('**filePath**/transactionsDetails.csv')
taxes.to_csv('**filePath**/transactionsTaxes.csv')
import pyodbc
connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=**serverIP**;PORT=1433;DATABASE=myDBName;UID=myUsername;PWD=myPword;')
with open('**filePath**/transactionsDetails.csv', 'r') as source:
rdr = csv.reader(source)
with open('**filePath**/transactionsDetails2.csv', 'w') as result:
wtr = csv.writer(result)
for r in rdr:
wtr.writerow((r[1], r[3], r[4],r[5],r[6],r[7],r[8],r[9],r[10],r[11],r[12],r[13],r[14],r[15],r[16]))
with open('**filePath**/transactionsDetails2.csv', 'r') as f:
reader = csv.reader(f)
print(reader)
columns = next(reader)
query = 'insert into transactionDetails({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * (int(len(columns)))))
print(query)
cursor = connection.cursor()
for data in reader:
cursor.execute(query, data)
cursor.commit()
connection.close()
我很好奇是否有人有更清洁的方法来做到这一点?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.