简体   繁体   English

的PHP,MySQL的,我的内存泄漏

[英]php, mysql, my memory leaking

I didn't expect this script (throw-away) to be leaking and I haven't figured out what the culprit is. 我没想到这个脚本(被丢弃)会泄漏,而且我还没有弄清楚罪魁祸首是什么。 Can you spot anything? 你能发现什么吗? Although this is throw-away code, I'm concerned that I'll repeat this in the future. 尽管这是一次性的代码,但我担心将来会重复此过程。 I've never had to manage memory in PHP, but with the number of rows in the db, it's blowing up my php instance (already upped the memory to 1Gb). 我从来没有用PHP管理内存,但是随着数据库中的行数增加,它炸毁了我的php实例(已经将内存增加到1Gb)。

The california table is especially larger than the others (currently 2.2m rows, less as I delete duplicate rows). 加利福尼亚州的表格特别大(目前为220万行,随着我删除重复的行而减少)。 I get a memory error on line 31 ($row = mysql_fetch_assoc($res)) 我在第31行收到内存错误($ row = mysql_fetch_assoc($ res))

Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocat e 24 bytes) in C:\\Documents and Settings\\R\\My Documents\\My Webpages\\cdiac\\cdiac_ dup.php on line 31 致命错误:在第31行的C:\\ Documents and Settings \\ R \\ My Documents \\ My Webpages \\ cdiac \\ cdiac_ dup.php中,耗尽了1073741824字节的允许内存大小(尝试分配24字节)

PHP 5.3.0, mysql 5.1.36. PHP 5.3.0,MySQL 5.1.36。 part of a wamp install. 沼泽安装的一部分。

here's the entire code. 这是整个代码。 the purpose of this script is to delete duplicate entries (data was acquired into segmented tables, which was far faster at the time, but now I have to merge those tables.) 该脚本的目的是删除重复的条目(将数据采集到分段表中,这在当时要快得多,但是现在我必须合并这些表。)

what's causing it? 是什么原因造成的? something I'm overlooking? 我忽略的东西? or do I just need to watch the memory size and call garbage collection manually when it gets big? 还是只需要查看内存大小并在内存变大时手动调用垃圾回收?

<?php

define('DBSERVER', 'localhost');
define('DBNAME', '---');
define('DBUSERNAME', '---');
define('DBPASSWORD', '---');

$dblink = mysql_connect(DBSERVER, DBUSERNAME, DBPASSWORD);
mysql_select_db(DBNAME, $dblink);


$state = "AL";
//if (isset($_GET['state'])) $state=mysql_real_escape_string($_GET['state']); 
if (isset($argv[1]) ) $state = $argv[1];

echo "Scanning $state\n\n";


// interate through listing of a state to check for duplicate entries (same station_id, year, month, day)
$DBTABLE = "cdiac_data_". $state;
$query = "select * from $DBTABLE ";
$query .= " order by station_id, year, month, day ";

$res = mysql_query($query) or die ("could not run query '$query': " . mysql_errno() . " " . mysql_error());

$last = "";
$prev_row;
$i = 1;
$counter = 0;
echo ".\n";
while ($row = mysql_fetch_assoc($res)) {  
  $current = $row["station_id"] . "_" . $row["year"] . "_" . sprintf("%02d",$row["month"]) . "_" . sprintf("%02d",$row["day"]);
  echo str_repeat(chr(8), 80) . "$i  $current ";
  if ($last == $current) {
    //echo implode(', ', $row) . "\n";

    // merge $row and $prev_row
    // data_id  station_id, state_abbrev, year, month,  day,  TMIN, TMIN_flags, TMAX, TMAX_flags, PRCP, PRCP_flags, SNOW, SNOW_flags, SNWD, SNWD_flags

    printf("%-13s %8s %8s\n", "data_id:", $prev_row["data_id"], $row["data_id"]);
    if ($prev_row["data_id"] == $row["data_id"]) echo " + ";

    $set = "";
    if (!$prev_row["TMIN"] && $row["TMIN"])  $set .= "TMIN = " . $row["TMIN"] . ", ";
    if (!$prev_row["TMIN_flags"] && $row["TMIN_flags"])   $set .= "TMIN_flags = '" . $row["TMIN_flags"] . "', ";
    if (!$prev_row["TMAX"] && $row["TMAX"])   $set .= "TMAX = " . $row["TMAX"] . ", ";
    if (!$prev_row["TMAX_flags"] && $row["TMAX_flags"])   $set .= "TMAX_flags = '" . $row["TMAX_flags"] . "', ";
    if (!$prev_row["PRCP"] && $row["PRCP"])   $set .= "PRCP = " . $row["PRCP"] . ", ";
    if (!$prev_row["PRCP_flags"] && $row["PRCP_flags"])   $set .= "PRCP_flags = '" . $row["PRCP_flags"] . "', ";
    if (!$prev_row["SNOW"] && $row["SNOW"])   $set .= "SNOW = " . $row["SNOW"] . ", ";
    if (!$prev_row["SNOW_flags"] && $row["SNOW_flags"])   $set .= "SNOW_flags = '" . $row["SNOW_flags"] . "', ";
    if (!$prev_row["SNWD"] && $row["SNWD"])   $set .= "SNWD = " . $row["SNWD"] . ", ";
    if (!$prev_row["SNWD_flags"] && $row["SNWD_flags"])   $set .= "SNWD_flags = '" . $row["SNWD_flags"] . "', ";

    $delete = "";
    $update = "";
    if ($set = substr_replace( $set, "", -2 )) $update = "UPDATE $DBTABLE SET $set WHERE data_id=".$prev_row["data_id"]." and year=".$row["year"]." and month=".$row["month"]." and day=".$row["day"].";\n";
    if ($row["data_id"] != $prev_row["data_id"]) $delete = "delete from $DBTABLE where data_id=".$row["data_id"]." and year=".$row["year"]." and month=".$row["month"]." and day=".$row["day"].";\n\n";

    if ($update) {
      $r = mysql_query($update) or die ("could not run query '$update' \n".mysql_error());
    }
    if ($delete) {
      $r = mysql_query($delete) or die ("could not run query '$delete' \n".mysql_error());
    }    

    //if ($counter++ > 5) exit(0);
  }
  else {
    $last = $current;
    unset($prev_row);
    //copy $row to $prev_row
    foreach ($row as $key => $val) $prev_row[$key] = $val;
  }

  $i++;
}

    echo "\n\nDONE\n"; 
?>

I would try two things: 我会尝试两件事:

1) Instead of running the UPDATE and DELETE queries inside the loop using mysql_query , save them in a text file, to execute later. 1)而不是使用mysql_query在循环内运行UPDATE和DELETE查询,而是将它们保存在文本文件中,以便以后执行。 For example: file_put_contents('queries.sql', $update, FILE_APPEND ); 例如: file_put_contents('queries.sql', $update, FILE_APPEND );

2) Instead of doing everything inside the while ($row = mysql_fetch_assoc($res)) loop, first grab all SELECT query results, then close database connection freeing all database resources, including the query result. 2)而不是执行while ($row = mysql_fetch_assoc($res))循环内的所有操作,而是首先获取所有SELECT查询结果,然后关闭数据库连接以释放所有数据库资源,包括查询结果。 Only after this, perform the loop process. 仅在此之后,才执行循环过程。

If you run out of memory while storing the database results in one array, you can try saving the results in a temporary file instead (one record per line / FILE_APPEND), and then use this file in the loop (reading one line per record, using fgets function). 如果在将数据库结果存储在一个数组中时内存不足,则可以尝试将结果保存到一个临时文件中(每行一条记录/ FILE_APPEND),然后在循环中使用该文件(每条记录读取一行,使用fgets函数)。

Work smarter, not harder: 更聪明地工作,而不是更辛苦:

SELECT station_id, year, month FROM table
    GROUP BY station_id, year, month
    HAVING COUNT(*) > 1

That'll get you all the station_id/year/month tuples that appear in the table more than once. 这将使您多次出现在表中的所有station_id /年/月元组。 Assuming that most of your data is not duplicates, that'll save you a lot of memory, since now you just have to go through these tuples and fix up the rows matching them. 假设您的大多数数据不是重复的,那将为您节省大量内存,因为现在您只需要遍历这些元组并修复与它们匹配的行。

I found this when trying to trace down a memory use problem on a script of mine. 在尝试跟踪我的脚本中的内存使用问题时,我发现了这一点。 Having solved the issue for mine I thought it worth adding a reply here for the next person who comes along with the same issue. 为我解决了这个问题后,我认为值得在此为下一个遇到同样问题的人添加一个答复。

I was using mysqli, but much the same applies for mysql. 我使用的是mysqli,但适用于mysql的情况大致相同。

The problem I found was the queries not freeing their results. 我发现的问题是查询没有释放其结果。 The solution was to use mysqli_free_result() after executing the update and delete queries. 解决方案是在执行更新和删除查询后使用mysqli_free_result()。 But more importantly on the mysqli_query for the loop I used the extra parameter of *MYSQLI_USE_RESULT* . 但更重要的是,在mysqli_query的循环中,我使用了* MYSQLI_USE_RESULT *的额外参数。 There are side effects of this, so use a separate connection for the update and delete queries. 这样做有副作用,因此请对更新和删除查询使用单独的连接。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM