简体   繁体   English

PHP PDO fetch() 循环在处理部分大型数据集后终止

[英]PHP PDO fetch() loop dies after processing part of large dataset

I have a PHP script which processes a "large" dataset (about 100K records) from a PDO query into a single collection of objects, in a typical loop:我有一个 PHP 脚本,它在一个典型的循环中将来自 PDO 查询的“大”数据集(大约 10 万条记录)处理为单个对象集合:

while ($record = $query->fetch()) {
    $obj = new Thing($record);

    /* do some processing */

    $list[] = $obj;
    $count++;
}

error_log('Processed '.$count.' records');

This loop processes about 50% of the dataset and then inexplicably breaks.这个循环处理了大约 50% 的数据集,然后莫名其妙地中断了。

Things I have tried:我尝试过的事情:

  • Memory profiling: memory_get_peak_usage() consistently outputs about 63MB before the loop dies.内存分析: memory_get_peak_usage()在循环终止之前始终输出大约 63MB。 The memory limit is 512MB, set through php.ini.内存限制为 512MB,通过 php.ini 设置。
  • Using set_time_limit() to increase script execution time to 1 hour (3600 seconds).使用set_time_limit()将脚本执行时间增加到 1 小时(3600 秒)。 The loop breaks long before that and I don't see the usual error in the log for this one.循环在那之前很久就中断了,我在日志中没有看到这个错误的常见错误。
  • Setting PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to false to avoid buffering the entire datasetPDO::MYSQL_ATTR_USE_BUFFERED_QUERY设置为false以避免缓冲整个数据集
  • Logging out $query->errorInfo() immediately after the loop break.在循环中断后立即注销$query->errorInfo() This was no help as the error code was "00000".这没有帮助,因为错误代码是“00000”。
  • Checking the MySQL error log.检查 MySQL 错误日志。 Nothing of note in there before, after, or while this script runs.在此脚本运行之前、之后或期间,没有任何值得注意的地方。
  • Batching the processing into 20K-record chunks.将处理分批处理为 20K 记录块。 No difference.没有不同。 Loop broke in the same spot.循环在同一地点破裂。 However, by "cleaning up" the PDO statement object at the end of each batch, I was able to get the processed total to 54%.但是,通过在每批结束时“清理”PDO 语句对象,我能够将处理的总数提高到 54%。

Other weird behavior:其他奇怪的行为:

  • When I set the memory limit using ini_set('memory_limit', '1024MB') , the loop actually dies earlier than with a smaller memory limit, at about 20% progress.当我使用ini_set('memory_limit', '1024MB')设置内存限制时,循环实际上比使用较小的内存限制更早结束,进度约为 20%。
  • During this loop, the PHP process uses 100% CPU, but once it breaks, usage drops back down to 2%, despite immediate processing in another loop immediately afterwards.在这个循环中,PHP 进程使用了​​ 100% 的 CPU,但是一旦它中断,使用率就会下降到 2%,尽管随后立即在另一个循环中进行了处理。 Likely, the connection with the MySQL server in the first loop is very resource-intensive.很可能,在第一个循环中与 MySQL 服务器的连接非常耗费资源。

I am doing this all locally using MAMP PRO if that makes any difference.如果这有什么不同,我将使用 MAMP PRO 在本地完成所有这些操作。

Is there something else that could be consistently breaking this loop that I haven't checked?有没有其他东西可以持续打破我没有检查过的这个循环? Is this simply not a viable strategy for processing this many records?这难道不是处理这么多记录的可行策略吗?

UPDATE更新

After using a batching strategy (20K increments), I have started to see a MySQL error consistently around the third batch: MySQL server has gone away ;使用批处理策略(以 20K 为增量)后,我开始在第三批MySQL server has gone away始终看到 MySQL 错误: MySQL server has gone away possibly a symptom of a long-running unbuffered query.可能是长时间运行的无缓冲查询的症状。

If You really need to process 100K records on the fly, You should do the processing in SQL, and fetch the result as You need it - it should save a lot of time.如果您真的需要动态处理 100K 条记录,您应该在 SQL 中进行处理,并根据需要获取结果 - 这应该可以节省大量时间。

But You probably cant do that for some reason.但是由于某种原因,您可能无法这样做。 You always process all the rows from statement, so use fetchAll once - and let MySQL alone after that, like that:你总是处理语句中的所有行,所以使用 fetchAll 一次 - 然后让 MySQL 单独使用,就像这样:

$records = $query->fetchAll()
foreach ($records as record) 
{
    $obj = new Thing($record);
    /* do some processing */
    $list[] = $obj;
    $count++;
}
error_log('Processed '.$count.' records');

Also, select only rows that You will use.此外,仅选择您将使用的行。 If this does not help, You can try with this: Setting a connect timeout with PDO .如果这没有帮助,您可以尝试这样做: 使用 PDO 设置连接超时

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM