在 MySQL 数据库中保存 pandas dataframe 的最快方法是什么

Question

我正在 python 中编写代码，以基于来自另一个数据库的另一个 mysql 表生成和更新 mysql 表。

我的代码是这样的：

对于 date_range 中的日期：

在 db1 中查询两个日期之间的数量
在 pandas => df 中做一些工作
在 db2 中删除 df 中 id 的行
用 df.to_sql 保存 df

操作 1-2 耗时不到 2s，而操作 3-4 最多耗时 10s。 第 4 步比第 3 步多花 4 倍。如何改进我的代码以使编写过程更高效

我已经为第 3 步和第 4 步分块了 df。我在.to_sql中添加了method=multi （这根本不起作用）。 我想知道我们是否可以做得更好；

with db.begin() as con:
    for chunked in chunks(df.id.tolist(), 1000):
        _ = con.execute(""" DELETE FROM table where id 
                            in {} """.format(to_tuple(chunked)))
    for chunked in chunks(df.id.tolist(), 100000):        
        df.query("id in @chunked").to_sql('table', con, index=False, 
        if_exists='append')

感谢您的帮助

Answer 1

我发现df.to_sql非常慢。 One way that I've gotten around it this issue is by outputting the dataframe into a csv file with df.to_csv and using BCP in to bluk insert the data in the csv into the table then deleting the csv file once its done with the insertion . 您可以使用子进程在 python 脚本中运行 BCP。

在 MySQL 数据库中保存 pandas dataframe 的最快方法是什么

问题描述

1 个解决方案

解决方案1
0 2020-03-06 18:10:15

在 MySQL 数据库中保存 pandas dataframe 的最快方法是什么

问题描述

1 个解决方案

解决方案1 0 2020-03-06 18:10:15

解决方案1
0 2020-03-06 18:10:15