How to speed up Pandas .to_sql function?

Question

import cx_Oracle
import pandas as pd
from sqlalchemy import create_engine

# credentials
username = "user"
password = "password"
connectStr = "ip:port/service_name"

df = pd.read_csv("data.csv")

# connection
dsn = cx_Oracle.makedsn('my_ip',service_name='my_service_name')

engine = create_engine('oracle+cx_oracle://%s:%s@%s' % (username, 
password, dsn))

# upload dataframe to ORCLDB
df.to_sql(name="test",con=engine, if_exists='append', index=False)

How can I speed up the .to_sql function in Pandas? It's taking me 20mins to write a 120kb file with 1,000 rows as a dataframe into the DB. The column types are all VARCHAR2(256).

Database columns: https://imgur.com/a/9EVwL5d

Answer 1

What is happening here is that for every row you insert, it has to wait for the transaction to be completed before the next one can start. The work around here is to do a "bulk insert" using a CSV file that is loaded into memory. I know how this is done using postgres (what I am using) but for oracle, I am not sure. Here is the code I am using for postgres, perhaps it will be of some help.

def bulk_insert_sql_replace(engine, df, table, if_exists='replace', sep='\t', encoding='utf8'):

    # Create Table
    df[:0].to_sql(table, engine, if_exists=if_exists, index=False)
    print(df)

    # Prepare data
    output = io.StringIO()
    df.to_csv(output, sep=sep, index=False, header=False, encoding=encoding)
    output.seek(0)

    # Insert data
    connection = engine.raw_connection()
    cursor = connection.cursor()
    cursor.copy_from(output, table, sep=sep, null='')
    connection.commit()
    cursor.close()

Here is a link to another thread that has tons of great information regarding this issue: Bulk Insert A Pandas DataFrame Using SQLAlchemy

How to speed up Pandas .to_sql function?

Question

1 answers

solution1
0 2018-12-21 19:05:29

How to speed up Pandas .to_sql function?

Question

1 answers

solution1 0 2018-12-21 19:05:29

solution1
0 2018-12-21 19:05:29