简体   繁体   English

在 PHP 中强制释放内存

[英]Force freeing memory in PHP

In a PHP program, I sequentially read a bunch of files (with file_get_contents ), gzdecode them, json_decode the result, analyze the contents, throw most of it away, and store about 1% in an array.在 PHP 程序中,我依次读取一堆文件(使用file_get_contents ), gzdecode它们进行json_decode对结果进行json_decode ,分析内容,将大部分内容扔掉,然后将大约 1% 的内容存储在一个数组中。

Unfortunately, with each iteration (I traverse over an array containing the filenames), there seems to be some memory lost (according to memory_get_peak_usage , about 2-10 MB each time).不幸的是,每次迭代(我遍历包含文件名的数组),似乎都会丢失一些内存(根据memory_get_peak_usage ,每次大约 2-10 MB)。 I have double- and triple-checked my code;我对我的代码进行了两次和三次检查; I am not storing unneeded data in the loop (and the needed data hardly exceeds about 10MB overall), but I am frequently rewriting (actually, strings in an array).我没有在循环中存储不需要的数据(并且所需的数据总体上几乎不超过 10MB),但我经常重写(实际上是数组中的字符串)。 Apparently, PHP does not free the memory correctly, thus using more and more RAM until it hits the limit.显然,PHP 没有正确释放内存,因此使用越来越多的 RAM,直到达到限制。

Is there any way to do a forced garbage collection?有没有办法进行强制垃圾收集? Or, at least, to find out where the memory is used?或者,至少,找出内存在哪里使用?

it has to do with memory fragmentation.它与内存碎片有关。

Consider two strings, concatenated to one string.考虑连接为一个字符串的两个字符串。 Each original must remain until the output is created.在创建输出之前,每个原件都必须保留。 The output is longer than either input.输出比任一输入都长。
Therefore, a new allocation must be made to store the result of such a concatenation.因此,必须进行新的分配来存储这种串联的结果。 The original strings are freed but they are small blocks of memory.原始字符串被释放,但它们是小块内存。
In a case of 'str1' . 'str2' . 'str3' . 'str4''str1' . 'str2' . 'str3' . 'str4'的情况下'str1' . 'str2' . 'str3' . 'str4' 'str1' . 'str2' . 'str3' . 'str4' 'str1' . 'str2' . 'str3' . 'str4' you have several temps being created at each . 'str1' . 'str2' . 'str3' . 'str4'你在每个 . -- and none of them fit in the space thats been freed up. ——而且它们都不适合腾出的空间。 The strings are likely not laid out in contiguous memory (that is, each string is, but the various strings are not laid end to end) due to other uses of the memory.由于内存的其他用途,这些字符串可能没有布置在连续的内存中(即,每个字符串都是,但不同的字符串不是首尾相连的)。 So freeing the string creates a problem because the space can't be reused effectively.所以释放字符串会产生一个问题,因为空间不能被有效地重用。 So you grow with each tmp you create.所以你随着你创建的每个 tmp 增长。 And you don't re-use anything, ever.而且你永远不会重复使用任何东西。

Using the array based implode, you create only 1 output -- exactly the length you require.使用基于数组的内爆,您只需创建 1 个输出——正是您需要的长度。 Performing only 1 additional allocation.仅执行 1 次额外分配。 So its much more memory efficient and it doesn't suffer from the concatenation fragmentation.因此它的内存效率更高,并且不会受到连接碎片的影响。 Same is true of python.蟒蛇也是如此。 If you need to concatenate strings, more than 1 concatenation should always be array based:如果您需要连接字符串,多于 1 个连接应该始终基于数组:

''.join(['str1','str2','str3'])

in python在蟒蛇

implode('', array('str1', 'str2', 'str3'))

in PHP在 PHP 中

sprintf equivalents are also fine. sprintf 等价物也很好。

The memory reported by memory_get_peak_usage is basically always the "last" bit of memory in the virtual map it had to use. memory_get_peak_usage 报告的内存基本上总是它必须使用的虚拟映射中的“最后”内存位。 So since its always growing, it reports rapid growth.因此,由于它一直在增长,因此它报告了快速增长。 As each allocation falls "at the end" of the currently used memory block.由于每次分配都落在当前使用的内存块的“末尾”。

In PHP >= 5.3.0, you can call gc_collect_cycles() to force a GC pass.在 PHP >= 5.3.0 中,您可以调用gc_collect_cycles()强制进行 GC 传递。

Note: You need to have zend.enable_gc enabled in your php.ini enabled, or call gc_enable() to activate the circular reference collector.注意:您需要有zend.enable_gc中启用了php.ini启用,或致电gc_enable()来激活循环引用收集器。

Found the solution: it was a string concatenation.找到了解决方案:它是一个字符串连接。 I was generating the input line by line by concatenating some variables (the output is a CSV file).我通过连接一些变量(输出是一个 CSV 文件)逐行生成输入。 However, PHP seems not to free the memory used for the old copy of the string, thus effectively clobbering RAM with unused data.但是,PHP 似乎没有释放用于字符串旧副本的内存,从而有效地用未使用的数据破坏 RAM。 Switching to an array-based approach (and imploding it with commas just before fputs-ing it to the outfile) circumvented this behavior.切换到基于数组的方法(并在 fputs-ing 到输出文件之前用逗号将其内爆)绕过了这种行为。

For some reason - not obvious to me - PHP reported the increased memory usage during json_decode calls, which mislead me to the assumption that the json_decode function was the problem.出于某种原因——对我来说并不明显——PHP 报告了 json_decode 调用期间内存使用量的增加,这使我误认为 json_decode 函数是问题所在。

There's a way.有办法。

I had this problem one day.有一天我遇到了这个问题。 I was writing from a db query into csv files - always allocated one $row, then reassigned it in the next step.我正在从 db 查询写入 csv 文件 - 总是分配一个 $row,然后在下一步中重新分配它。 Kept running out of memory.不断耗尽内存。 Unsetting $row didn't help;取消设置 $row 没有帮助; putting an 5MB string into $row first (to avoid fragmentation) didn't help;首先将 5MB 字符串放入 $row(以避免碎片)没有帮助; creating an array of $row-s (loading many rows into it + unsetting the whole thing in every 5000th step) didn't help.创建一个 $row-s 数组(将许多行加载到其中 + 在每 5000 步中取消设置整个内容)没有帮助。 But it was not the end, to quote a classic.但这不是结束,引用经典。

When I made a separate function that opened the file, transferred 100.000 lines (just enough not to eat up the whole memory) and closed the file, THEN I made subsequent calls to this function (appending to the existing file), I found that for every function exit, PHP removed the garbage.当我创建一个单独的函数打开文件,传输 100.000 行(刚好不占用整个内存)并关闭文件时,然后我对该函数进行了后续调用(附加到现有文件),我发现对于每次函数退出,PHP 清除垃圾。 It was a local-variable-space thing.这是一个局部变量空间的东西。

TL;DR TL; 博士

When a function exits, it frees all local variables.当函数退出时,它释放所有局部变量。

If you do the job in smaller portions, like 0 to 1000 in the first function call, then 1001 to 2000 and so on, then every time the function returns, your memory will be regained.如果您以较小的部分完成这项工作,例如第一次函数调用中的 0 到 1000,然后是 1001 到 2000 等等,那么每次函数返回时,您的记忆都会重新获得。 Garbage collection is very likely to happen on return from a function.垃圾收集很可能在函数返回时发生。 (If it's a relatively slow function eating a lot of memory, we can safely assume it always happens.) (如果它是一个消耗大量内存的相对较慢的函数,我们可以安全地假设它总是发生。)

Side note: for reference-passed variables it will obviously not work;旁注:对于引用传递的变量,它显然不起作用; a function can only free its inside variables that would be lost anyway on return.一个函数只能释放它的内部变量,这些变量在返回时无论如何都会丢失。

I hope this saves your day as it saved mine!我希望这能像拯救我一样拯救你的一天!

I've found that PHP's internal memory manager is most-likely to be invoked upon completion of a function.我发现 PHP 的内部内存管理器最有可能在函数完成时被调用。 Knowing that, I've refactored code in a loop like so:知道了这一点,我在循环中重构了代码,如下所示:

while (condition) {
  // do
  // cool
  // stuff
}

to

while (condition) {
  do_cool_stuff();
}

function do_cool_stuff() {
  // do
  // cool
  // stuff
}

EDIT编辑

I ran this quick benchmark and did not see an increase in memory usage.我运行了这个快速基准测试,并没有看到内存使用量增加。 This leads me to believe the leak is not in json_decode()这让我相信泄漏不在json_decode()

for($x=0;$x<10000000;$x++)
{
  do_something_cool();
}

function do_something_cool() {
  $json = '{"a":1,"b":2,"c":3,"d":4,"e":5}';
  $result = json_decode($json);
  echo memory_get_peak_usage() . PHP_EOL;
}

Call memory_get_peak_usage() after each statement, and ensure you unset() everything you can.在每个语句之后调用memory_get_peak_usage() ,并确保您可以使用unset() If you are iterating with foreach() , use a referenced variable to avoid making a copy of the original ( foreach() ).如果您使用foreach()进行迭代,请使用引用变量以避免复制原始变量 ( foreach() )。

foreach( $x as &$y)

If PHP is actually leaking memory a forced garbage collection won't make any difference.如果 PHP 确实在泄漏内存,则强制垃圾收集不会有任何区别。

There's a good article on PHP memory leaks and their detection at IBM IBM有一篇关于 PHP 内存泄漏及其检测的好文章

I was going to say that I wouldn't necessarily expect gc_collect_cycles() to solve the problem - since presumably the files are no longer mapped to zvars.我想说我不一定希望 gc_collect_cycles() 解决这个问题——因为大概这些文件不再映射到 zvar。 But did you check that gc_enable was called before loading any files?但是在加载任何文件之前,您是否检查过 gc_enable 是否被调用?

I've noticed that PHP seems to gobble up memory when doing includes - much more than is required for the source and the tokenized file - this may be a similar problem.我注意到 PHP 在执行包含时似乎会占用内存 - 比源文件和标记化文件所需的要多得多 - 这可能是一个类似的问题。 I'm not saying that this is a bug though.我并不是说这是一个错误。

I believe one workaround would be not to use file_get_contents but rather fopen()....fgets()...fclose() rather than mapping the whole file into memory in one go.我相信一种解决方法是不使用 file_get_contents 而是使用 fopen()....fgets()...fclose() 而不是一次性将整个文件映射到内存中。 But you'd need to try it to confirm.但是你需要尝试它来确认。

HTH HTH

C. C。

There recently was a similar issue with System_Daemon .最近System_Daemon也有类似的问题 Today I isolated my problem to file_get_contents .今天我将我的问题隔离到file_get_contents

Could you try using fread instead?你可以试试用fread代替吗? I think this may solve your problem.我认为这可能会解决您的问题。 If it does, it's probably time to do a bugreport over at PHP.如果是这样,可能是时候在 PHP 上进行错误报告了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM