简体   繁体   English

如何加快Pandas .to_sql功能?

[英]How to speed up Pandas .to_sql function?

import cx_Oracle
import pandas as pd
from sqlalchemy import create_engine

# credentials
username = "user"
password = "password"
connectStr = "ip:port/service_name"

df = pd.read_csv("data.csv")

# connection
dsn = cx_Oracle.makedsn('my_ip',service_name='my_service_name')

engine = create_engine('oracle+cx_oracle://%s:%s@%s' % (username, 
password, dsn))

# upload dataframe to ORCLDB
df.to_sql(name="test",con=engine, if_exists='append', index=False)

How can I speed up the .to_sql function in Pandas? 如何加快Pandas中的.to_sql函数? It's taking me 20mins to write a 120kb file with 1,000 rows as a dataframe into the DB. 将一个包含1000行的120kb文件作为数据帧写入数据库需要20分钟。 The column types are all VARCHAR2(256). 列类型均为VARCHAR2(256)。

Database columns: https://imgur.com/a/9EVwL5d 数据库列: https : //imgur.com/a/9EVwL5d

What is happening here is that for every row you insert, it has to wait for the transaction to be completed before the next one can start. 这里发生的是,对于您插入的每一行,它必须等待事务完成才能开始下一行。 The work around here is to do a "bulk insert" using a CSV file that is loaded into memory. 解决方法是使用加载到内存中的CSV文件进行“批量插入”。 I know how this is done using postgres (what I am using) but for oracle, I am not sure. 我知道如何使用postgres(我正在使用的)完成此操作,但是对于oracle,我不确定。 Here is the code I am using for postgres, perhaps it will be of some help. 这是我用于postgres的代码,也许会对您有所帮助。

def bulk_insert_sql_replace(engine, df, table, if_exists='replace', sep='\t', encoding='utf8'):

    # Create Table
    df[:0].to_sql(table, engine, if_exists=if_exists, index=False)
    print(df)

    # Prepare data
    output = io.StringIO()
    df.to_csv(output, sep=sep, index=False, header=False, encoding=encoding)
    output.seek(0)

    # Insert data
    connection = engine.raw_connection()
    cursor = connection.cursor()
    cursor.copy_from(output, table, sep=sep, null='')
    connection.commit()
    cursor.close()

Here is a link to another thread that has tons of great information regarding this issue: Bulk Insert A Pandas DataFrame Using SQLAlchemy 这里是另一个线程的链接,该线程具有关于此问题的大量有用信息: 使用SQLAlchemy批量插入Pandas DataFrame

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM