简体   繁体   中英

extract millions of records from sql server and load into oracle database using python script

I am extracting millions of data from sql server and inserting into oracle db using python. It is taking 1 record to insert in oracle table in 1 sec.. takes hours to insert. What is the fastest approach to load? My code below:

def insert_data(conn,cursor,query,data,batch_size = 10000):
    recs = []
    count = 1
    for rec in data:
        recs.append(rec)
        if count % batch_size == 0:

            cursor.executemany(query, recs,batcherrors=True)
            conn.commit()`enter code here`
            recs = []
        count = count +1
        cursor.executemany(query, recs,batcherrors=True)
        conn.commit()

Perhaps you cannot buy a 3d Party ETL tool, but you can certainly write a procedure in PL/SQL in the oracle database.

First, install the oracle Transparenet Gateway for ODBC. No license cost involved. Second, in the oracl db, create a db link to reference the MSSQL database via the gateway. Third, write a PL/SQL procedure to pull the data from the MSSQL database, via the db link.

I was once presented a problem similar to yours. developer was using SSIS to copy around a million rows from mssql to oracle. Taking over 4 hours. I ran a trace on his process and saw that it was copying row-by-row, slow-by-slow. Took me less than 30 minutes write a pl/sql proc to copy the data, and it completed in less than 4 minutes.

I give a high-level view of the entire setup and process, here :

EDIT: Thought you might like to see exactly how simple the actual procedure is:

create or replace my_load_proc
begin
  insert into my_oracle_table (col_a,
                               col_b,
                               col_c)
  select sql_col_a,
          sql_col_b,
          sql_col_c
  from mssql_tbl@mssql_link;

end;

My actual procedure has more to it, dealing with run-time logging, emailing notification of completion, etc. But the above is the 'guts' of it, pulling the data from mssql into oracle.

then you might wanna use pandas or pyspark or other big data frameworks available on python there are a lot of example out there, here is how to load data from Microsoft Docs :

import pyodbc
import pandas as pd
import cx_Oracle

server = 'servername' 
database = 'AdventureWorks' 
username = 'yourusername' 
password = 'databasename'  
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
query = "SELECT [CountryRegionCode], [Name] FROM Person.CountryRegion;"
df = pd.read_sql(query, cnxn)


# you do data manipulation that is needed here
# then insert data into oracle


conn = create_engine('oracle+cx_oracle://xxxxxx')

df.to_sql(table_name, conn, index=False, if_exists="replace")

something like that, ( that might not work 100%, but just to give you an idea how you can do it)

Exactly you can use SQL Server Integration Service, best practice i guess.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM