简体   繁体   English

将大型MySQL数据集与PHP进行比较

[英]Comparing large MySQL data sets with PHP

I have a set of approximately 1.1 million unique IDs and I need to determine which do not have a corresponding record in my application's database. 我有大约110万个唯一ID,我需要确定在我的应用程序数据库中没有对应记录的ID。 The set of IDs comes from a database as well, but not the same one. 一组ID也来自数据库,但不是同一组。 I am using PHP and MySQL and have plenty of memory - PHP is running on a server with 15GB RAM and MySQL runs on its own server which has 7.5GB RAM. 我正在使用PHP和MySQL,并且有足够的内存-PHP在具有15GB RAM的服务器上运行,而MySQL在其具有7.5GB RAM的服务器上运行。

Normally I'd simply load all the IDs in one query and then use them with the IN clause of a SELECT query to do the comparison in one shot. 通常,我只需要在一个查询中加载所有ID,然后将它们与SELECT查询的IN子句一起使用就可以进行一次比较。

So far my attempts have resulted in scripts that either take an unbearably long time or that spike the CPU to 100%. 到目前为止,我的尝试导致脚本花费了难以忍受的长时间,或者使CPU达到了100%。

What's the best way to load such a large data set and do this comparison? 加载如此大的数据集并进行比较的最佳方法是什么?

Generate a dump of the IDs from the first database into a file, then re-load it into a temporary table on the second database, and do a join between that temporary table and the second database table to identify those ids that don't have a matching record. 从第一个数据库生成ID的转储到文件中,然后将其重新加载到第二个数据库的临时表中,并在该临时表和第二个数据库表之间进行联接以标识那些没有匹配的记录。 Once you've generated that list, you can drop the temporary table. 生成该列表后,可以删除临时表。

That way, you're not trying to work with large volumes of data in PHP itself, so you shouldn't have any memory issues. 这样,您就不会尝试在PHP本身中处理大量数据,因此您不会遇到任何内存问题。

Assuming you can't join the tables since they are not on the same DB server, and that your server can handle this, I would populate an array with all the IDs from one DB, then loop over the IDs from the other and use in_array to see if each one exists in the array. 假设由于它们不在同一台数据库服务器上而不能加入这些表,并且服务器可以处理这些表,我将用一个数据库中的所有ID填充一个数组,然后遍历另一个数据库中的ID并使用in_array查看数组中是否存在每个。

BTW - according to this , you can make the in_array more efficient. 顺便说一句-根据这个 ,可以使in_array更有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM