简体繁体 English

为什么MySQL命令行比Python这么快？

[英]Why is MySQL command line so fast vs. Python?

原文 2016-02-24 02:22:11 7 1 python/ mysql/ postgresql

I need to migrate data from MySQL to Postgres. 我需要将数据从MySQL迁移到Postgres。 It's easy to write a script that connects to MySQL and to Postgres, runs a select on the MySQL side and inserts on the Postgres side, but it is veeeeery slow (I have + 1M rows). 编写连接到MySQL和Postgres的脚本很容易，在MySQL一侧运行一个select并在Postgres一侧插入，但是它的速度很慢（我有+ 1M行）。 It's much faster to write the data to a flat file and then import it. 将数据写入平面文件然后导入它要快得多。

The MySQL command line can download tables pretty fast and output them as tab-separated values, but that means executing a program external to my script (either by executing it as a shell command and saving the output to a file or by reading directly from the stdout). MySQL命令行可以非常快速地下载表并将其输出为制表符分隔的值，但这意味着执行脚本外部的程序（通过将其作为shell命令执行并将输出保存到文件中，或者直接从脚本中读取）标准输出）。 I am trying to download the data using Python instead of the MySQL client. 我正在尝试使用Python而不是MySQL客户端下载数据。

Does anyone know what steps and calls does the MySQL command line perform to query a large dataset and output it to stdout? 有谁知道MySQL命令行执行什么步骤和调用来查询大型数据集并将其输出到stdout？ I thought it could be just that the client is in C and should be much faster than Python, but the Python binding for MySQL is itself in C so... any ideas? 我以为客户端可能是C语言，并且应该比Python快得多，但是MySQL的Python绑定本身就是C语言，所以...有什么想法吗？

1 个解决方案

I believe that the problem is that you are inserting each row in a separate transaction (which is the default behavior when you run SQL-queries without explicitly starting a transaction). 我认为问题在于您将每行插入一个单独的事务中（这是在运行SQL查询而不显式启动事务时的默认行为）。 In that case, the database must write (flush) changes to disk on every INSERT . 在这种情况下，数据库必须在每个INSERT上将更改写入（刷新）到磁盘。 It can be 100x times slower than inserting data in a single transaction. 它可能比在单个事务中插入数据慢100倍。 Try to run BEGIN before importing data and COMMIT after. 尝试在导入数据之前运行BEGIN之后尝试COMMIT 。