简体   繁体   中英

How to speed up Pandas .to_sql function?

import cx_Oracle
import pandas as pd
from sqlalchemy import create_engine

# credentials
username = "user"
password = "password"
connectStr = "ip:port/service_name"

df = pd.read_csv("data.csv")

# connection
dsn = cx_Oracle.makedsn('my_ip',service_name='my_service_name')

engine = create_engine('oracle+cx_oracle://%s:%s@%s' % (username, 
password, dsn))

# upload dataframe to ORCLDB
df.to_sql(name="test",con=engine, if_exists='append', index=False)

How can I speed up the .to_sql function in Pandas? It's taking me 20mins to write a 120kb file with 1,000 rows as a dataframe into the DB. The column types are all VARCHAR2(256).

Database columns: https://imgur.com/a/9EVwL5d

What is happening here is that for every row you insert, it has to wait for the transaction to be completed before the next one can start. The work around here is to do a "bulk insert" using a CSV file that is loaded into memory. I know how this is done using postgres (what I am using) but for oracle, I am not sure. Here is the code I am using for postgres, perhaps it will be of some help.

def bulk_insert_sql_replace(engine, df, table, if_exists='replace', sep='\t', encoding='utf8'):

    # Create Table
    df[:0].to_sql(table, engine, if_exists=if_exists, index=False)
    print(df)

    # Prepare data
    output = io.StringIO()
    df.to_csv(output, sep=sep, index=False, header=False, encoding=encoding)
    output.seek(0)

    # Insert data
    connection = engine.raw_connection()
    cursor = connection.cursor()
    cursor.copy_from(output, table, sep=sep, null='')
    connection.commit()
    cursor.close()

Here is a link to another thread that has tons of great information regarding this issue: Bulk Insert A Pandas DataFrame Using SQLAlchemy

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM