简体   繁体   English

PHP数组操作期间内存使用量突然增加

[英]PHP sudden increase of memory usage during array operations

I'm writing a maintenance script in PHP. 我正在用PHP编写维护脚本。 The script keeps about 100,000 key-value pairs in an associative array and compare a bunch of other data with that array. 该脚本在关联数组中保留约100,000个键值对,并将该数组与其他数据进行比较。

The keys are 12- or 16-byte hexademical strings. 密钥是12字节或16字节的十六进制字符串。

The values are arrays containing 1-10 strings. 值是包含1-10个字符串的数组。 Each string is around 50 bytes. 每个字符串约为50个字节。

I'm populating my array by reading a text file line-by-line with fgets() in a loop. 我通过循环读取fgets()逐行读取文本文件来填充数组。

All is fine until I hit about 44,000 keys, but after that the memory usage suddenly skyrockets. 直到我敲了大约44,000个键,一切都很好,但是之后内存使用量突然激增。

No matter how much I increase the memory limit (and I'm relunctant to give it any more than 256MB at the moment), the memory usage increases exponentially until it hits the new limit. 不管我增加多少内存限制(并且我现在不赞成将其增加到256MB以上),内存使用量都会成倍增加,直到达到新的限制。 This is weird! 真奇怪

The following is a table with the number of keys on the left and the memory usage on the right. 下表是左侧的键数量和右侧的内存使用情况的表。

10000     6668460
20000    12697828
30000    18917768
40000    25045068
41000    25658148
42000    26760304
43000    27350368
44000    27920400
45000    33438520
46000    77800344
47000   114203960
48000   161989660
49000   168419992
50000   206265572
Fatal error: Allowed memory size of 268435456 bytes exhausted

As you can see, the memory usage is consistent at 620-660 bytes per key until I reach 44,000 keys. 如您所见,内存使用率一直保持在每个键620-660字节,直到达到44,000 个键为止。 After that, memory usage suddenly begins to increase, until it reaches over 4KB per key at 50,000 keys. 此后,内存使用量突然开始增加,直到达到50,000个键的每个键超过4KB This is very strange because the size of my keys and values are always similar. 这很奇怪,因为我的键和值的大小始终相似。

It seems like I'm hitting some sort of internal limit on the number of keys I can have in an array, beyond which it all becomes very inefficient. 似乎我在数组中可以拥有的键数达到了某种内部限制,超过这些限制,效率将变得非常低下。

If I can maintain a memory usage of 620-600 bytes per key (which sounds reasonable given the usual overhead of using an array), my entire dataset should fit in approx. 如果我可以将每个键的内存使用量维持在620-600字节(考虑到使用数组的通常开销,这听起来是合理的),那么我的整个数据集应该适合 64MB of memory and therefore easily accessible when I need to reference it later in the same script. 64MB的内存,因此当我以后需要在同一脚本中引用它时可以轻松访问。 This was the assumption when I first started writing the script. 这是我第一次开始编写脚本时的假设。 It's a maintenance script run from the CLI, so it's OK to use 64MB of memory from time to time. 这是从CLI运行的维护脚本,因此可以不时使用64MB内存。

But if the memory usage keeps increasing like the above, I'll have no choice but to offload the key-value dataset to an external daemon like Memcached, Redis, or an SQL database, and the network overhead will greatly slow down the maintenance script. 但是,如果内存使用量像上面那样不断增加,我别无选择,只能将键值数据集卸载到外部守护程序(如Memcached,Redis或SQL数据库),并且网络开销将大大减慢维护脚本的速度。 。

What I tried so far: 到目前为止我尝试过的是:

  • I tried flattening the two-dimensional array into multiple one-dimensional array. 我尝试将二维数组展平为多个一维数组。 No luck. 没运气。
  • I tried splitting the large array into multiple smaller arrays. 我尝试将大型数组拆分为多个较小的数组。 No luck. 没运气。
  • I tried not using arrays at all and turning every key into a separate variable. 我试着根本不使用数组,而是将每个键转换为单独的变量。 No luck. 没运气。
  • I can't use SplFixedArray because my keys are not numeric (and can't be converted to numbers within the integer range) and the array needs to be mutable. 我不能使用SplFixedArray因为我的键不是数字的(并且不能转换为整数范围内的数字),并且数组需要可变。
  • I would rather not use Quickhash, Judy, or any of the other alternative array implementations written as a C extension. 我宁愿不使用Quickhash,Judy或其他任何作为C扩展编写的替代数组实现。
  • Sorry, this script needs to be in PHP. 抱歉,此脚本需要使用PHP。 Don't ask me why... 不要问我为什么...

The test server is a virtual machine running Ubuntu 12.04 LTS, 32-bit, with PHP 5.3.10-1ubuntu3.9. 测试服务器是一台运行32位Ubuntu 12.04 LTS,PHP 5.3.10-1ubuntu3.9的虚拟机。

Any ideas? 有任何想法吗?

  • Is this something that is fixed in a more recent version of PHP? 这是在最新版本的PHP中修复的吗?
  • Should I just give the dataset to an external daemon like Memcached? 我是否应该将数据集提供给像Memcached这样的外部守护进程?

Thanks! 谢谢!

It's the garbage collection I think. 我认为这是垃圾收集。 At some point you're using operations that allocate temporary space which then cannot be freed during the hard work so php will eat up your memory for no real purpose. 在某些时候,您正在使用分配临时空间的操作,这些临时空间在辛勤工作中无法释放,因此php会耗尽您的内存,而没有任何实际目的。

When I faced this problem I finally got to the conclusion that garbage is being thrown away at specific events only, like exiting a function. 当我面对这个问题时,我最终得出的结论是,仅在特定事件(如退出函数)上抛出了垃圾。 So what you should try is, make the job in several smaller steps and let your variables "relax" between them - make a function that only does a thousand elements at a time, then call it again to continue where it left off. 因此,您应该尝试的是,使工作分几个较小的步骤进行,并让变量在它们之间“放松”-创建一个仅一次执行一千个元素的函数,然后再次调用该函数以继续执行该操作。

Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM