简体   繁体   中英

Is there a faster way to get big dataframe from python to sql server table?

I am trying to get nearly 13000000 rows to sql server but it gives resource pool error. Here is the code:

def execute_sql_file(pconnection, pfilepath, **psqlparams):
    cursor = pconnection.cursor()
    f = open(pfilepath)
    full_sql = f.read()
    for key, value in psqlparams.items():
        full_sql = full_sql.replace('#' + str(key) + '#', str(value))
    f.close()
    return cursor.execute(full_sql)


def write_result_to_db(dfsave, pServer='oltp', pDatabase='warehouse',pSchema = 'dbo',pTable = 
'tb_table'):
    params = urllib.parse.quote_plus(
    "DRIVER={SQL Server Native Client 11.0};SERVER=" + pServer + ";DATABASE=" + pDatabase + ";   
             Trusted_Connection=yes;MARS_Connection=Yes")
    engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
    to_db_connection = engine.connect()

    @sqlalchemy.event.listens_for(engine, "before_cursor_execute")
    def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
        if executemany:
            cursor.fast_executemany = True

    chunknum = math.floor(2100 / dfsave.shape[1]) - 1
    if dfsave.shape[0] > 0:
        dfsave.to_sql(pTable, con=engine, schema=pSchema, if_exists='append', index=False, 
                       method='multi',chunksize=chunknum)
    
    return to_db_connection

This is the function. And after doing mathematical operations, at the end of the calculations, I get df. And when I try to get it to sql server I use this code:

write_result_to_db(df,pTable='tb_table',pServer='testoltp', pSchema = 'dbo', 
                   pDatabase = 'warehouse')

But it gives error. Then I tried to divide df and write to sql table like this:

bol = int(tb_result.shape[0] / 50)
for start in range(0, tb_result.shape[0], bol):
    write_result_to_db(tb_result.iloc[start:start + 
                       bol],pTable='tb_table',pServer='testoltp', pSchema = 
                       'dbo', pDatabase = 'testwarehouse')
    time.sleep(15)

After doing this loop, it gives resource pool error. I think I can't do it with this. So how can I get dataframe from python to sql server table?

try using modin library. You might be able to resolve it https://github.com/modin-project/modin

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM