[英]Inserting Data to SQL Server from a Python Dataframe Quickly
I have been trying to insert data from a dataframe in Python to a table already created in SQL Server.我一直在尝试将数据从 dataframe 中的 Python 插入到已在 SQL 服务器中创建的表中。 The data frame has 90K rows and wanted the best possible way to quickly insert data in the table.
数据框有 90K 行,需要尽可能最好的方法来快速将数据插入表中。 I only have read,write and delete permissions for the server and I cannot create any table on the server.
我对服务器只有读、写和删除权限,不能在服务器上创建任何表。
Below is the code which is inserting the data but it is very slow.下面是插入数据的代码,但速度很慢。 Please advise.
请指教。
import pandas as pd
import xlsxwriter
import pyodbc
df = pd.read_excel(r"Url path\abc.xlsx")
conn = pyodbc.connect('Driver={ODBC Driver 11 for SQL Server};'
'SERVER=Server Name;'
'Database=Database Name;'
'UID=User ID;'
'PWD=Password;'
'Trusted_Connection=no;')
cursor= conn.cursor()
#Deleting existing data in SQL Table:-
cursor.execute("DELETE FROM datbase.schema.TableName")
conn.commit()
#Inserting data in SQL Table:-
for index,row in df.iterrows():
cursor.execute("INSERT INTO Table Name([A],[B],[C],) values (?,?,?)", row['A'],row['B'],row['C'])
conn.commit()
cursor.close()
conn.close()
To insert data much faster, try using sqlalchemy
and df.to_sql
.要更快地插入数据,请尝试使用
sqlalchemy
和df.to_sql
。 This requires you to create an engine using sqlalchemy
, and to make things faster use the option fast_executemany=True
这需要您使用
sqlalchemy
创建引擎,并使用选项fast_executemany=True
来加快速度
connect_string = urllib.parse.quote_plus(f'DRIVER={{ODBC Driver 11 for SQL Server}};Server=<Server Name>,<port>;Database=<Database name>')
engine = sqlalchemy.create_engine(f'mssql+pyodbc:///?odbc_connect={connect_string}', fast_executemany=True)
with engine.connect() as connection:
df.to_sql(<table name>, connection, index=False)
This should do what you want...very generic example...这应该做你想做的......非常通用的例子......
# Insert from dataframe to table in SQL Server
import time
import pandas as pd
import pyodbc
# create timer
start_time = time.time()
from sqlalchemy import create_engine
df = pd.read_csv("C:\\your_path\\CSV1.csv")
conn_str = (
r'DRIVER={SQL Server Native Client 11.0};'
r'SERVER=Excel-PC\SQLEXPRESS;'
r'DATABASE=NORTHWND;'
r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)
cursor = cnxn.cursor()
for index,row in df.iterrows():
cursor.execute('INSERT INTO dbo.Table_1([Name],[Address],[Age],[Work]) values (?,?,?,?)',
row['Name'],
row['Address'],
row['Age'],
row['Work'])
cnxn.commit()
cursor.close()
cnxn.close()
# see total time to do insert
print("%s seconds ---" % (time.time() - start_time))
Try that and post back if you have additional questions/issues/concerns.如果您有其他问题/问题/疑虑,请尝试并回复。
Replace df.iterrows() with df.apply() for one thing.一方面,将 df.iterrows() 替换为 df.apply()。 Remove the loop for something much more efficient.
删除循环以获得更有效的方法。
Here is the script and hope this works for you.这是脚本,希望这对你有用。
import pandas as pd
import pyodbc as pc
connection_string = "Driver=SQL Server;Server=localhost;Database={0};Trusted_Connection=Yes;"
cnxn = pc.connect(connection_string.format("DataBaseNameHere"), autocommit=True)
cur=cnxn.cursor()
df= pd.read_csv("your_filepath_and_filename_here.csv").fillna('')
query = 'insert into TableName({0}) values ({1})'
query = query.format(','.join(df.columns), ','.join('?' * len(df1.columns)))
cur.fast_executemany = True
cur.executemany(query, df.values.tolist())
cnxn.close()
Try to populate a temp table with 1 or none indexes then insert it into your good table all at once.尝试用 1 个或无索引填充临时表,然后将其一次性全部插入到您的好表中。 Might speed things up due to not having to update the indexes after each insert??
由于不必在每次插入后更新索引,可能会加快速度??
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.