简体   繁体   English

Perl:在运行时构建二维数组时出现内存不足错误

[英]Perl :Out of Memory Error while building a 2d array at run time

I am perl beginner. 我是初学者。 I am trying to build a 2d array at run time from a binary file. 我试图在运行时从二进制文件构建一个2d数组。 I am getting a "out of memory" error. 我收到了“内存不足”的错误。 I am using Perl 5.16.3 in windows7. 我在windows7中使用Perl 5.16.3。 My input file size is ~4.2MB. 我的输入文件大小约为4.2MB。 My system has a physical memory of 4GB and I am hitting 90% usage and then showing up the out of memory error when I run this code. 我的系统具有4GB的物理内存,我使用90%的使用率,然后在运行此代码时显示内存不足错误。

I tried lot of ways to debug this. 我尝试了很多方法来调试它。 Only If I reduce the b32 to b16 or less, I am able to run successfully. 只有将b32减小到b16或更小,我才能成功运行。 Even with this, if the file size increase beyond 4MB, the error shows up again. 即使这样,如果文件大小增加超过4MB,则错误再次出现。 I tried looking at physical memory usage in task manager while executing the code, it keep on increasing. 我尝试在执行代码时查看任务管理器中的物理内存使用情况,并且不断增加。

My friend suspected this should be memory leak issue. 我的朋友怀疑这应该是内存泄漏问题。 I couldnt make out with his suspect. 我无法与他的嫌犯弄清楚。 I need help on fixing this. 我需要帮助解决这个问题。

#!/usr/bin/perl
use strict;
use warnings;

open( DATA, 'debug.bin' ) or die "Unable to open:$!";
binmode DATA;
my ( $data, $n, $i );
my @2dmatrix;
while ( $n = read DATA, $data, 4 ) {
    push @2dmatrix, [ split( '', unpack( 'b32', $data ) ) ];
}
print scalar(@2dmatrix);
print "completed reading";
close(DATA);

Just to clear the requirement. 只是为了清除要求。 From the 2d array build, I need to extract contents from a column A corresponding to a particular pattern (11111111000000001111111100000000) in column B. This needs to be done on 4 set of columns with a file size of 500Mb. 从2d数组构建中,我需要从B列中对应于特定模式(11111111000000001111111100000000)的A列中提取内容。这需要在文件大小为500Mb的4组列上完成。

It's not a memory leak, your program is just very inefficient with memory use. 这不是内存泄漏,你的程序使用内存效率非常低。

For every 4 bytes you read in, you do an unpack 'b32' which creates a 32-character string; 对于您读入的每4个字节,您将unpack 'b32' ,它会创建一个32个字符的字符串; split // it, which turns it into 32 1-character strings, make an arrayref of the resulting list, and push the arrayref on @2dmatrix . split // it,将其转换为32个1个字符的字符串,生成结果列表的arrayref,然后在@2dmatrix上推送@2dmatrix That results in: 这导致:

  • 32 string bodies, each at least 2 bytes (for "0\\0" or "1\\0" ) although perl might decide to use more to avoid reallocations if the strings grow: 64 bytes. 32个字符串体,每个至少2个字节(对于"0\\0""1\\0" ),尽管如果字符串增长,perl可能决定使用更多来避免重新分配:64字节。
  • 32 SVPVs (scalar variables containing strings, 28 bytes each on 32-bit, 40 bytes each on 64-bit): 896 or 1280 bytes. 32个SVPV(包含字符串的标量变量,32位各28个字节,64位各40个字节):896或1280个字节。
  • 1 array body with 32 entries: 128 bytes on 32-bit, 256 bytes on 64-bit. 1个具有32个条目的数组主体:32位为128字节,64位为256字节。
  • 1 AV (array variable): 28 bytes on 32-bit, 40 bytes on 64-bit. 1 AV(数组变量):32位为28个字节,64位为40个字节。
  • 1 SVRV (scalar containing a reference): 16 bytes on 32-bit, 24 bytes on 64-bit. 1 SVRV(包含引用的标量):32位16字节,64位24字节。
  • 1 entry in @2dmatrix 's array body: 4 bytes on 32-bit, 8 bytes on 64-bit. @2dmatrix数组体中的1个条目:32位为4个字节,64位为8个字节。

With a result of 1136 bytes per 4 bytes (284x multiplication) on 32-bit and 1672 bytes per 4 bytes (418x multiplication) on 64-bit, not accounting for constant factors and the fact that perl might choose to use larger string bodies (on two versions of perl I tested here, I got either 10 or 16 bytes, not 2.) As such your program will use upwards of 1.1GB of memory for a 4.2MB input on a 32-bit system, and upwards of 1.7GB of memory for a 4.2MB input on a 64-bit system. 结果是在32位上为每4字节1136字节(284x乘法),在64位上为每4字节1672字节(418x乘法),这没有考虑常量因素以及perl可能选择使用更大的字符串主体这一事实(在我在此处测试的两个版本的perl上,我得到的不是10个字节就是16个字节,不是2个字节。)这样,您的程序将在32位系统上为4.2MB输入使用超过1.1GB的内存,而对1.7GB使用更高的内存。在64位系统上输入4.2MB的内存。

The solution here is to store and access the data in a more efficient way, but I can't give any specific advice because you haven't said what you're actually trying to do with @2dmatrix once you have it. 这里的解决方案是以一种更有效的方式存储和访问数据,但是我无法提供任何具体建议,因为一旦您拥有@2dmatrix您就不会说自己实际上在尝试什么。

Forget reading in memory whole file content. 忘记在内存中读取整个文件内容。 Make function to access to data(x,y) which subsequently will access value in file Also consider looking at http://search.cpan.org/~leont/File-Map-0.63/lib/File/Map.pm#Advantages_of_memory_mapping 使函数访问数据(x,y),随后将访问文件中的值也考虑查看http://search.cpan.org/~leont/File-Map-0.63/lib/File/Map.pm#Advantages_of_memory_mapping

Reading the whole file into a string and using vec looks like this: 将整个文件读入字符串并使用vec如下所示:

my $data = do { local $/; <DATA> };

Then to get a particular row/col, use: 然后要获得特定的行/列,请使用:

$value = vec( $data, $row*32+$col, 1 );

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM