Bulk insert into vertica using Python

Question

I am using python to transfer data (~8 million rows) from oracle to vertica. I wrote a python script which transfers the data in 2 hours, but I am looking for ways to increase the transfer speed.

Process I am using :

Connect to Oracle
Pull the data into a dataframe (pandas)
Iterate over the rows in the dataframe one by one and insert into vertica (cursor.execute), I wanted to use the dataframe.to_sql method, but this method is limited to only couple of databases

Has anybody used a better way (bulk inserts or any other method?) to insert data into vertica using python?

Here is the code snippet :

df = pandas.read_sql_query(sql,conn)
conn_vertica = pyodbc.connect("DSN=dsnname")
cursor = conn_vertica.cursor()

for i,row in df.iterrows():
    cursor.execute("insert into <tablename> values(?,?,?,?,?,?,?,?,?)",row.values[0],row.values[1],row.values[2],row.values[3],row.values[4],row.values[5],row.values[6],row.values[7],row.values[8])

cursor.close()
conn_vertica.commit()
conn_vertica.close()

Answer 1

来自vertica-python代码https://github.com/uber/vertica-python/blob/master/vertica_python/vertica/cursor.py

with open("/tmp/file.csv", "rb") as fs: cursor.copy("COPY table(field1,field2) FROM STDIN DELIMITER ',' ENCLOSED BY '\\"'", fs, buffer_size=65536)

Answer 2

Doing single row inserts into Vertica is very inefficient. You need to load in batches.

The way we do it is using the COPY command, here is an example:

COPY mytable (firstcolumn, secondcolumn) FROM STDIN DELIMITER ',' ENCLOSED BY '"';

Have you considered using an existing library, for example vertica-python

Check out this link to Vertica's docs for more info on COPY options

Answer 3

In case you want to load a dataframe instead of the csv file into a Vertica table you can use this command:

from vertica_python import connect

db_connection = connect(host = 'hostname'
                       ,port = 5433
                       ,user = 'user', password = 'password'
                       ,database = 'db_name'
                       ,unicode_error = 'replace')

cursor = db_connection.cursor()    

cursor.copy("COPY table_name (field1, field2, ...) from stdin DELIMITER ','", \
            df.to_csv(header=None, index=False)\
           )

This part below is that makes the difference, it converts a dataframe in the memory into comma separated lines of strings that copy command can read:

df.to_csv(header=None, index=False)

It works very fast.

Bulk insert into vertica using Python

Question

3 answers

solution1
5 2015-11-05 22:51:01

solution2
0 2015-09-18 19:40:26

solution3
0 2021-03-23 17:38:03

Bulk insert into vertica using Python

Question

3 answers

solution1 5 2015-11-05 22:51:01

solution2 0 2015-09-18 19:40:26

solution3 0 2021-03-23 17:38:03

solution1
5 2015-11-05 22:51:01

solution2
0 2015-09-18 19:40:26

solution3
0 2021-03-23 17:38:03