简体繁体 English

插入和查询大量数据MySQL的最佳方法

[英]Best way to insert and query a lot of data MySQL

原文 2015-05-28 14:00:20 8 2 mysql/ sql/ phpexcel/ bulkinsert

I have to read approximately 5000 rows of 6 columns max from a xls file. 我必须从xls文件读取最多6列的大约5000行。 I'm using PHP >= 5.3. 我正在使用PHP> = 5.3。 I'll be using PHPExcel for that task. 我将使用PHPExcel来完成该任务。 I haven't try it but I think it can handle (If you have other options, they are welcome). 我没有尝试过，但我认为它可以处理（如果您有其他选择，欢迎使用）。 The issue is that every time I read a row, I need to query the database to verify if that particular row exists, If it does, then overwrite it, If not, then add it. 问题是，每次读取一行时，我都需要查询数据库以验证该特定行是否存在，如果存在，则将其覆盖，如果不存在，则将其添加。 I think that's going to take a lot of time and PHP will just simply timeout ( I can't modify the timeout variable since it's a shared server). 我认为这将花费大量时间，PHP只会简单地超时（由于它是共享服务器，所以我无法修改超时变量）。 Could you give me a hand with this? 你能帮我个忙吗？ Appreciate your help 感谢您的帮助

2 个解决方案

Since you're using MySQL, all you have to do is insert data and not worry about a row being there at all. 由于您正在使用MySQL，因此您所要做的就是插入数据，而不用担心根本就没有一行。 Here's why and how: 原因和方式如下：

If you query a database from PHP to verify a row exists, that's bad. 如果从PHP查询数据库以验证某行存在，那就不好了。 Reason it's bad is because you are prone to getting false results. 之所以不好是因为您容易得到错误的结果。 There's a lag between PHP and MySQL, and PHP can't be used to verify data integrity. PHP和MySQL之间存在一定的滞后性，PHP无法用于验证数据完整性。 That's the job of the database. 那就是数据库的工作。
To ensure there are no duplicate rows, we use UNIQUE constraints on our columns. 为了确保没有重复的行，我们在列上使用UNIQUE约束。
MySQL extends SQL standard using INSERT INTO ... ON DUPLICATE KEY UPDATE syntax. MySQL使用INSERT INTO ... ON DUPLICATE KEY UPDATE语法扩展了SQL标准。 That lets you just insert data, and if there's a duplicate row - you can just update it with new data. 这样就可以插入数据，如果有重复的行，则可以使用新数据进行更新。
Reading 5000 rows is quick. 快速读取5000行。 Inserting 5000 is also quick, if you wrap it in a transaction. 如果将它包装在事务中，插入5000也很快。 I would suggest reading 100 rows from the excel file, starting a transaction and just insert data (using ON DUPLICATE KEY UPDATE to handle duplicates). 我建议从excel文件中读取100行，开始事务并仅插入数据（使用ON DUPLICATE KEY UPDATE处理重复项）。 That will let you spend 1 I/O of your hard drive to save 100 records. 这样一来，您就可以在硬盘驱动器上花费1个I / O来保存100条记录。 Doing so, you can finish this whole process in a few seconds, which lets you not to worry about performance or timeouts. 这样，您可以在几秒钟内完成整个过程，从而不必担心性能或超时。

At first run this process via exec, and timeout has no matter 首先，通过exec运行此过程，超时无关紧要
At second, select all rows before read excel file. 第二，选择所有行，然后再读取Excel文件。 Select not at one query, read 2000 rows at time for example, and collect it to array. 选择一次不查询，例如一次读取2000行，然后将其收集到数组中。
At third use xlsx format and chunkReader , that allows read not whole file. 第三种使用xlsx格式和chunkReader ，它允许读取的不是整个文件。 It's not 100% garantee, but i did the same. 这不是100％保证，但我也做同样的事情。