简体   繁体   English

有没有更快的方法使用 python 将 dataframe 插入到 SQL?

[英]Is there a faster way to insert dataframe to SQL using python?

We have two parts to get final data frame into SQL.我们有两个部分将最终数据帧放入 SQL。

  1. downlaoding from datasets from Azure and transforming using python.从 Azure 的数据集下载并使用 python 进行转换。
  2. Uploading transformed data into Azure and then inserting the final dataframe into SQL将转换后的数据上传到 Azure,然后将最终的 dataframe 插入到 SQL

Downloading, transforming and uploading takes 5 mins but insertion to SQL is taking quite long time.下载、转换和上传需要 5 分钟,但插入到 SQL 需要相当长的时间。 I used below code for faster insertion.我使用下面的代码来加快插入速度。

server = 'XXXX.database.windows.net' 
database = 'XXX' 
username = 'XXX' 
password = 'XXXX' 
driver= '{ODBC Driver 17 for SQL Server}' 
cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)



params = urllib.parse.quote_plus('DRIVER='+driver+
                      ';SERVER='+server+
                      ';PORT=1433;DATABASE='+database+
                      ';UID='+username+
                      ';PWD='+ password)
    
engine = 
sa.create_engine("mssql+pyodbc:///?odbc_connect={}".format(params),fast_executemany=True)
conn = engine.connect()



with engine.connect() as connection:
    
    try:  
      
         df_copy.to_sql('XXXX',connection,if_exists = 'append',index=False,chunksize=500)

    except SQLAlchemyError as e:
     error = str(e.__dict__['orig'])
     print(error)
    

conn.close()

Final data frame contains 97000 rows with 127 columns.最终数据框包含 97000 行和 127 列。

SQL Server configuration: Purchased Azure SQL 10 DTUS 250GB of storage. SQL 服务器配置:购买 Azure SQL 10 DTUS 250GB 存储。

The error is错误是

Exception has occurred: OperationalError (pyodbc.OperationalError) ('08S01', '[08S01] [Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: An existing connection was forcibly closed by the remote host.\r\n (10054) (SQLExecute); [08S01] [Microsoft][ODBC Driver 17 for SQL Server]Communication link failure (10054)')发生异常:OperationalError (pyodbc.OperationalError) ('08S01', '[08S01] [Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: 现有连接被远程主机强行关闭。\r\n (10054 ) (SQLExecute); [08S01] [Microsoft][ODBC Driver 17 for SQL Server]Communication link failure (10054)')

I have also used connect_args={'connect_timeout': 2400} inside create engine but after 40-50 mins we are receiving the same error mgs.我还在创建引擎中使用了connect_args={'connect_timeout': 2400}但在 40-50 分钟后我们收到了相同的错误消息。 I think 50 mins for 97k records is quite long time.我认为 97k 记录 50 分钟是相当长的时间。 Any way I could improve the process?我有什么办法可以改进这个过程吗? Also, I'm currently running on my local machine which has 16GB ram and 12th Gen Intel(R) Core(TM) i7-1265U 1.80 GHz processor.此外,我目前正在我的本地机器上运行,该机器具有 16GB 内存和第 12 代 Intel(R) Core(TM) i7-1265U 1.80 GHz 处理器。 Also, we use Jenkins for deployment.此外,我们使用 Jenkins 进行部署。 Will there be any faster performance if we test it on Jenkins?如果我们在Jenkins上测试会不会有更快的性能?

hello there you should try to specify your chunksize in your call df.to_sql(engine, connect(), index=False, if_exists='append', method=None, chunksize = 50000)您好,您应该尝试在调用中指定块大小 df.to_sql(engine, connect(), index=False, if_exists='append', method=None, chunksize = 50000)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python:在特定位置将行插入DataFrame的更快方法? - Python: Faster way to insert rows into a DataFrame at specific locations? Python 更快的方法将 dataframe 的每一行作为一个表插入到 mysql - Python faster way to insert each row of dataframe as one table to mysql 有没有更快的方法来获得从 python 到 sql 服务器表的大 dataframe? - Is there a faster way to get big dataframe from python to sql server table? Python 更新数据帧的更快方法 - Python Faster way of updating dataframe 如何使用python 3更快地将批量csv数据插入SQL Server - How to insert bulk csv data into SQL Server using python 3 faster Python - 在数据框中运行 for 循环的更快方法 - Python - faster way to run a for loop in a dataframe Python:是否有更快的方法在 for 循环中过滤 dataframe - Python: Is there a faster way to filter on dataframe in a for loop 使用Python将Pandas Dataframe插入SQL Server-Jupyter Notebook - INSERT Pandas Dataframe to SQL-Server using Python - Jupyter Notebook 有没有比在 python 中使用 loc 更快的方法来基于现有的数据帧填充新列? - Is there a faster way to fill in a new column in a dataframe based on exsiting ones than using loc in python? 将 Dataframe 插入 SQL 服务器的更有效方法 - More Efficient Way To Insert Dataframe into SQL Server
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM