简体   繁体   中英

PHP PDO fetch() loop dies after processing part of large dataset

I have a PHP script which processes a "large" dataset (about 100K records) from a PDO query into a single collection of objects, in a typical loop:

while ($record = $query->fetch()) {
    $obj = new Thing($record);

    /* do some processing */

    $list[] = $obj;
    $count++;
}

error_log('Processed '.$count.' records');

This loop processes about 50% of the dataset and then inexplicably breaks.

Things I have tried:

  • Memory profiling: memory_get_peak_usage() consistently outputs about 63MB before the loop dies. The memory limit is 512MB, set through php.ini.
  • Using set_time_limit() to increase script execution time to 1 hour (3600 seconds). The loop breaks long before that and I don't see the usual error in the log for this one.
  • Setting PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to false to avoid buffering the entire dataset
  • Logging out $query->errorInfo() immediately after the loop break. This was no help as the error code was "00000".
  • Checking the MySQL error log. Nothing of note in there before, after, or while this script runs.
  • Batching the processing into 20K-record chunks. No difference. Loop broke in the same spot. However, by "cleaning up" the PDO statement object at the end of each batch, I was able to get the processed total to 54%.

Other weird behavior:

  • When I set the memory limit using ini_set('memory_limit', '1024MB') , the loop actually dies earlier than with a smaller memory limit, at about 20% progress.
  • During this loop, the PHP process uses 100% CPU, but once it breaks, usage drops back down to 2%, despite immediate processing in another loop immediately afterwards. Likely, the connection with the MySQL server in the first loop is very resource-intensive.

I am doing this all locally using MAMP PRO if that makes any difference.

Is there something else that could be consistently breaking this loop that I haven't checked? Is this simply not a viable strategy for processing this many records?

UPDATE

After using a batching strategy (20K increments), I have started to see a MySQL error consistently around the third batch: MySQL server has gone away ; possibly a symptom of a long-running unbuffered query.

If You really need to process 100K records on the fly, You should do the processing in SQL, and fetch the result as You need it - it should save a lot of time.

But You probably cant do that for some reason. You always process all the rows from statement, so use fetchAll once - and let MySQL alone after that, like that:

$records = $query->fetchAll()
foreach ($records as record) 
{
    $obj = new Thing($record);
    /* do some processing */
    $list[] = $obj;
    $count++;
}
error_log('Processed '.$count.' records');

Also, select only rows that You will use. If this does not help, You can try with this: Setting a connect timeout with PDO .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM