简体   繁体   English

使用Python Pandas Dataframe将数据插入Sql server的问题

[英]Issue with inserting data into Sql server using Python Pandas Dataframe

I am trying to pull data from a REST API and insert it into SQL Server. 我试图从REST API中提取数据并将其插入SQL Server。 If we have the script do the PhotoBinary,Filetype together it works but as soon as I add the ID which is an integer we get the error below. 如果我们有脚本执行PhotoBinary,Filetype一起工作,但只要我添加ID是一个整数,我们得到下面的错误。 Also if I just have it pull ID on its own from the API it works. 此外,如果我只是从它自己的API拉它ID工作。

I am trying to pull 3 pieces of information 我试图提取3条信息

  1. The EmployeeID which is an int. EmployeeID是一个i​​nt。
  2. The Binary String representation of the image 图像的二进制字符串表示
  3. The file type of the original file eg: .jpg 原始文件的文件类型,例如:.jpg

The target table is setup as: 目标表设置为:

Create table Employee_Photo
( 
    EmployeeID  int,
    PhotoBinary varchar(max),
    FileType varchar(10)
)

The Error I get is: 我得到的错误是:

Traceback (most recent call last):
  File "apiphotopullwithid.py", line 64, in <module>
    cursor.execute("INSERT INTO dbo.Employee_Photo([EmployeeID],[PhotoBinary],[FileType]) values (?,?,?)", row['EMPID'],row['Photo'],row['PhotoType'])
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 5 (""): The supplied value is not a valid instance of data type float. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision. (8023) (SQLExecDirectW)')
import json
import pandas as pd
import sqlalchemy
import pyodbc
import requests

url = "https://someurl.com/api/PersonPhoto"

headers = {
    'Accept': "application/json",
    'Authorization': "apikey XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    'Content-Type': "application/json",
    'cache-control': "no-cache"
}

response = requests.request("GET", url, headers=headers)
data = json.loads(response.text)


ID,Photo,PhotoType = [],[],[]

for device in data['PersonPhoto']:
    ID.append(device[u'ID'])

    Photo.append(device[u'Photo'])

    PhotoType.append(device[u'PhotoType'])


df = pd.DataFrame([ID,Photo,PhotoType]).T
df.columns = ['EMPID','Photo','PhotoType']
df = df.astype({'EMPID':'Int64'})



connStr = pyodbc.connect(
    "DRIVER={SQL Server};"
    "SERVER=SQLTest;"
    "Database=Intranet123;"
    "Trusted_Connection=yes;"
    #"UID=ConnectME;"
    #"PWD={Password1}"
)
cursor = connStr.cursor()

for index,row in df.iterrows():
cursor.execute("INSERT INTO dbo.Employee_Photo([EmployeeID],[PhotoBinary],[FileType]) values (?,?,?)", row['EMPID'],row['Photo'],row['PhotoType']) 
    connStr.commit()
    cursor.close()
connStr.close()

You're using the old Windows built-in SQL Server driver. 您正在使用旧的 Windows内置SQL Server驱动程序。 Try the newer one, which you can get from here for multiple platforms. 尝试更新的,您可以从这里获得多个平台。

Don't read too much into the error message. 不要在错误消息中阅读太多内容。 Something is malformed in the network protocol layer. 网络协议层中出现问题。

Can you dump the types and values of the parameters causing the issue. 您可以转储导致问题的参数的类型和值。 My guess is that the driver is setting the parameter types incorrectly. 我的猜测是驱动程序错误地设置了参数类型。

EG: 例如:

for index,row in df.iterrows():
  empid =  row['EMPID']
  photo = row['Photo']
  photoType = row['PhotoType']

  print("empid is ",type(empid), " photo is ", type(photo), " photoType is ", type(photoType))
  print("empid: ",empid, " photo: ", photo, " photoType: ", photoType)

  cursor.execute("INSERT INTO dbo.Employee_Photo([EmployeeID],[PhotoBinary],[FileType]) values (?,?,?)", empid,photo,photoType) 

connStr.commit()
cursor.close()
connStr.close()

In most Python database APIs including pyodbc adhering to the PEP 249 specs, the parameters argument in cursor.execute() is usually a sequence (ie, tuple, list). 在大多数Python数据库API中,包括遵守PEP 249规范的pyodbc, cursor.execute()parameters参数通常是一个序列(即元组,列表)。 Therefore, bind all values into an iterable and not as three separate argument values: 因此,将所有值绑定到可迭代而不是三个单独的参数值:

sql = "INSERT INTO dbo.Employee_Photo ([EmployeeID],[PhotoBinary],[FileType]) VALUES (?,?,?)"

# TUPLE
cursor.execute(sql, (row['EMPID'], row['Photo'], row['PhotoType']))

# LIST
cursor.execute(sql, [row['EMPID'], row['Photo'], row['PhotoType']])

By the way, avoid the explicit iterrows loop and use implicit loop with executemany using Pandas' DataFrame.values : 顺便说一句,避免使用显式的iterrows循环并使用Pandas的DataFrame.values使用带有executemany隐式循环:

# EXECUTE PARAMETERIZED QUERY
sql_cols = ['EMPID', 'Photo', 'PhotoType']
cursor.executemany(sql, df[sql_cols].values.tolist())   
conn.commit()

Actually, you do not even need Pandas as a middle layer (use library for just data science) and interact with original returned json: 实际上,你甚至不需要将Pandas作为中间层(仅使用数据库科学库)并与原始返回的json进行交互:

# NESTED LIST OF TUPLES
vals = [(int(device[u'ID']), device[u'Photo'], device[u'PhotoType']) \
           for device in data['PersonPhoto']]

cursor.executemany(sql, vals)   
conn.commit()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM