简体   繁体   中英

What is the fastest way to save a pandas dataframe in a MySQL Database

I am writing a code in python to generate and update a mysql table based on another mysql table from another database.

My code does something like this:

For dates in a date_range:

  1. Query a quantity in db1 between 2 dates

  2. Do some work in pandas => df

  3. Delete in db2 the rows with the ids that are in df

  4. save df with df.to_sql

The operation 1-2 are taking less than 2s when 3-4 can take up to 10s. Step 4 takes 4 more times than 3. How can I improve my code to make the writing process more efficient

I have already chunked the df for step 3 and 4. I have added method=multi in .to_sql (this did not work at all). I was wondering if we could do better;

with db.begin() as con:
    for chunked in chunks(df.id.tolist(), 1000):
        _ = con.execute(""" DELETE FROM table where id 
                            in {} """.format(to_tuple(chunked)))
    for chunked in chunks(df.id.tolist(), 100000):        
        df.query("id in @chunked").to_sql('table', con, index=False, 
        if_exists='append')

thanks for your help

I have found df.to_sql to be a very slow. One way that I've gotten around it this issue is by outputting the dataframe into a csv file with df.to_csv and using BCP in to bluk insert the data in the csv into the table then deleting the csv file once its done with the insertion. You can use subprocess to run BCP in a python script.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM