简体   繁体   English

为什么MySQL命令行比Python这么快?

[英]Why is MySQL command line so fast vs. Python?

I need to migrate data from MySQL to Postgres. 我需要将数据从MySQL迁移到Postgres。 It's easy to write a script that connects to MySQL and to Postgres, runs a select on the MySQL side and inserts on the Postgres side, but it is veeeeery slow (I have + 1M rows). 编写连接到MySQL和Postgres的脚本很容易,在MySQL一侧运行一个select并在Postgres一侧插入,但是它的速度很慢(我有+ 1M行)。 It's much faster to write the data to a flat file and then import it. 将数据写入平面文件然后导入它要快得多。

The MySQL command line can download tables pretty fast and output them as tab-separated values, but that means executing a program external to my script (either by executing it as a shell command and saving the output to a file or by reading directly from the stdout). MySQL命令行可以非常快速地下载表并将其输出为制表符分隔的值,但这意味着执行脚本外部的程序(通过将其作为shell命令执行并将输出保存到文件中,或者直接从脚本中读取)标准输出)。 I am trying to download the data using Python instead of the MySQL client. 我正在尝试使用Python而不是MySQL客户端下载数据。

Does anyone know what steps and calls does the MySQL command line perform to query a large dataset and output it to stdout? 有谁知道MySQL命令行执行什么步骤和调用来查询大型数据集并将其输出到stdout? I thought it could be just that the client is in C and should be much faster than Python, but the Python binding for MySQL is itself in C so... any ideas? 我以为客户端可能是C语言,并且应该比Python快得多,但是MySQL的Python绑定本身就是C语言,所以...有什么想法吗?

I believe that the problem is that you are inserting each row in a separate transaction (which is the default behavior when you run SQL-queries without explicitly starting a transaction). 我认为问题在于您将每行插入一个单独的事务中(这是在运行SQL查询而不显式启动事务时的默认行为)。 In that case, the database must write (flush) changes to disk on every INSERT . 在这种情况下,数据库必须在每个INSERT上将更改写入(刷新)到磁盘。 It can be 100x times slower than inserting data in a single transaction. 它可能比在单个事务中插入数据慢100倍。 Try to run BEGIN before importing data and COMMIT after. 尝试在导入数据之前运行BEGIN之后尝试COMMIT

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM