简体   繁体   English

Fork MySQL INSERT INTO(InnoDB)

[英]Fork MySQL INSERT INTO (InnoDB)

I'm trying to insert about 500 million rows of garbage data into a database for testing. 我正在尝试将大约5亿行垃圾数据插入数据库进行测试。 Right now I have a PHP script looping through a few SELECT/INSERT statements each inside a TRANSACTION -- clearly this isn't the best solution. 现在我有一个PHP脚本循环遍历TRANSACTION的一些SELECT/INSERT语句 - 显然这不是最好的解决方案。 The tables are InnoDB (row-level locking). 表是InnoDB(行级锁定)。

I'm wondering if I (properly) fork the process, will this speed up the INSERT process? 我想知道我是否(正确)分叉过程,这会加速INSERT进程吗? At the rate it's going, it will take 140 hours to complete. 按照它的速度,需要140个小时才能完成。 I'm concerned about two things: 我关心两件事:

  1. If INSERT statements must acquire a write lock, then will it render forking useless, since multiple processes can't write to the same table at the same time? 如果INSERT语句必须获取写锁定,那么它会使forking无用,因为多个进程不能同时写入同一个表吗?

  2. I'm using SELECT...LAST_INSERT_ID() (inside a TRANSACTION ). 我正在使用SELECT...LAST_INSERT_ID() (在TRANSACTION )。 Will this logic break when multiple processes are INSERT ing into the database? 当多个进程INSERT数据库时,这个逻辑是否会中断? I could create a new database connection for each fork, so I hope this would avoid the problem. 我可以为每个fork创建一个新的数据库连接,所以我希望这可以避免这个问题。

  3. How many processes should I be using? 我应该使用多少个进程? The queries themselves are simple, and I have a regular dual-core dev box with 2GB RAM. 查询本身很简单,我有一个带2GB RAM的常规双核开发盒。 I set up my InnoDB to use 8 threads ( innodb_thread_concurrency=8 ), but I'm not sure if I should be using 8 processes or if this is even a correct way to think about matching. 我设置我的InnoDB使用8个线程( innodb_thread_concurrency=8 ),但我不确定我是否应该使用8个进程,或者这是否是考虑匹配的正确方法。

Thanks for your help! 谢谢你的帮助!

The MySQL documentation has a discussion on efficient insertion of a large number of records. MySQL文档讨论了有效插入大量记录的问题。 It seems that the clear winner is usage of the LOAD DATA INFILE command, followed by inserts that insert multiple values lists. 似乎明显的赢家是使用LOAD DATA INFILE命令,然后是插入多个值列表的插入。

1) yes, there will be lock contention, but innodb is designed to handle multiple threads trying to insert. 1)是的,会有锁争用,但innodb旨在处理多个尝试插入的线程。 sure, they won't simultaneously insert, but it will handle serializing the inserts for you. 当然,它们不会同时插入,但它会为您处理插入序列化。 just make sure you specifically close your transactions and you do it ASAP. 只是确保你专门关闭你的交易,你尽快完成。 this will ensure you get the best possible insert performance. 这将确保您获得最佳的插入性能。

2) no, this logic will not break provided you have 1 connection per thread, since last_insert_id() is connection specific. 2)不,如果每个线程有1个连接,则此逻辑不会中断,因为last_insert_id()是特定于连接的。

3) this is one of those things that you just need to benchmark to figure out. 3)这是你需要进行基准测试才能弄清楚的事情之一。 actually, i would make the program self-adjust. 实际上,我会让程序自我调整。 run 100 inserts with 8 threads and record the execution times. 运行100个插入8个线程并记录执行时间。 then try again with half as many and twice as many. 然后再尝试一半和两倍多。 whichever one is faster, then benchmark more thread count values around that number. 无论哪一个更快,然后在该数字周围标记更多的线程计数值。

in general, you should always just go ahead and benchmark this kind of stuff to see which is faster. 总的来说,你应该始终只是继续对这种东西进行基准测试,看看哪种更快。 in the amount of time it takes you to think about it and write it up, you could probably already have preliminary numbers. 在您考虑并编写它的时间量上,您可能已经有了初步数字。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM