简体   繁体   English

识别Perl / DBI代码中的内存问题

[英]Identify a memory problem in Perl/DBI code

To start out with - this is not my code - there was a problem with the code and I'm trying to find out how to debug the problem. 首先 - 这不是我的代码 - 代码有问题,我试图找出如何调试问题。 There are plenty of changes I would make to the code if I was given the chance (excessive braces, global variables, use the join function instead of foreach, etc., etc. etc.). 如果我有机会的话,我会对代码进行大量的更改(过多的大括号,全局变量,使用连接函数而不是foreach等等)。 It is full of bad practice, but that is not what I need help with. 它充满了不好的做法,但这不是我需要帮助的。

Here is a snippet of the Perl code (There are no subroutines, nothing special - basically open a file for the results of the query, execute the query, and dump the results to a file): 这是Perl代码的片段(没有子程序,没什么特别的 - 基本上打开查询结果的文件,执行查询,并将结果转储到文件中):

# earlier in the program, @row, $field, and $output are all declared globally, like this:
my @row;
my $field;
my $output;

# a file is opened for output, with filehandle ROWOUT
# a database statement handle (DBD::DB2) is executed

while ( @{row} = ${sth}->fetchrow_array ) {
    foreach ${field}( @{row} ) {
        ${field} =~ s/\s+$//;
        ${output} = "${output}\~${field}";
    }

    ${output} =~ s/\~//;
    print ROWOUT "${output}\n";
    undef ${output};
}

Somewhere in the while loop, the Perl script is crashing with an Out of Memory! 在while循环的某个地方,Perl脚本崩溃了Out of Memory! error (not a clean crash - it just stops running with that message.) 错误(不是一个干净的崩溃 - 它只是停止与该消息一起运行。)

In most runs, the volume on this query is very small. 在大多数运行中,此查询的卷非常小。 The results of the query this time when the script crashed is a lot bigger (still not huge): 150,000 rows, and each row is about 1200 bytes wide. 这次脚本崩溃时查询的结果要大得多(仍然不大):150,000行,每行大约1200字节宽。

Things that I have thought of: 我想到的事情:

  1. The fetchrow_array function of DBI is smart enough to not pull the complete dataset into memory, correct? DBI的fetchrow_array函数足够聪明,不能将完整的数据集拉入内存,对吗? My assumption is that the data is on the database, and fetchrow_array retrieves one row at a time, so that even if you had 10 billion rows, you should not have a memory problem - is that correct? 我的假设是数据在数据库上,并且fetchrow_array一次检索一行,所以即使你有100亿行,你也不应该有内存问题 - 这是正确的吗?
  2. Calling undef on the $output variable will free the memory that it was using, correct? $output变量上调用undef会释放它正在使用的内存,对吗? If it doesn't, that could be another place where a memory problem could exist. 如果没有,那可能是存在内存问题的另一个地方。
  3. The memory the @row variable is using will get re-used(?) each time a new row is retrieved, correct? @row变量使用的内存将在每次检索到新行时重复使用(?),对吗? If not, I could see how using a global array to store each row could blow out the memory. 如果没有,我可以看到如何使用全局数组存储每一行​​可能会耗尽内存。

I am hoping there is something obvious that I am just not understanding. 我希望有一些明显的东西,我只是不理解。 If there is not something obvious by looking at the code, what are some techniques I can use to debug this problem? 如果通过查看代码没有明显的东西,我可以用什么技术来调试这个问题?

Thanks in advance! 提前致谢!

It might be that you're (perhaps inadvertently) caching too many rows. 可能是您(可能是无意中)缓存了太多行。 You can find out how many have been brought in by checking $sth->{RowsInCache} . 您可以通过检查$sth->{RowsInCache}来了解已引入的$sth->{RowsInCache} If it's undef , then there is no cache, otherwise you'll be given the number of rows. 如果它是undef ,则没有缓存,否则您将获得行数。

You can also get away from the gymnastics you're having to do with $output by rewriting it as follows: 您还可以通过以下方式重写它来摆脱您必须使用$output的体操:

while ( my @this_row = $sth->fetchrow_array ) {
    # Get rid of this line once you figure out your memory problem.
    print STDERR "Using ", ($sth->{RowsInCache} || 0), " rows in cache\n";

    print ROWOUT join('~', map { s/\s+$// } @this_row), "\n";
}

So, assuming you have too many rows in your cache, you can limit it via: 因此,假设您的缓存中有太多行,您可以通过以下方式限制它:

my $dbh = DBI->connect($dsn, $user, $pass, { RowCacheSize => 20 })
    or die "Cannot connect to $dsn: $DBI::errstr\n";

From the DBI documentation, you can control the cache (assuming your driver supports it) by using a value as follows: 在DBI文档中,您可以使用以下值控制缓存(假设您的驱动程序支持它):

 0 - Automatically determine a reasonable cache size for each C<SELECT>
 1 - Disable the local row cache
>1 - Cache this many rows
<0 - Cache as many rows that will fit into this much memory for each C<SELECT>.

Increase trace level , and run the code under the Perl and GDB debuggers. 提高跟踪级别 ,并在Perl和GDB调试器下运行代码。 You need to find out where exactly the process goes out of control. 您需要找出过程失控的确切位置。

If you are not running the latest version of the relevant modules and DB, consider the possibility that you have found an old bug that has already been fixed. 如果您没有运行相关模块和数据库的最新版本,请考虑您找到已修复的旧错误的可能性。

As far as #1 goes, I do believe it loads the entire result into memory Edit: I recall this being an option in DBI 就#1而言,我相信它会将整个结果加载到内存中编辑:我记得这是DBI中的一个选项

For #2 and #3, you should really be localizing your variables to the scope they are used in. 对于#2和#3,您应该将变量本地化到它们所使用的范围。

I suspect you are actually running out of memory after your execute, though I know you said otherwise. 我怀疑你执行后实际上已经没有内存了,尽管我知道你说不然。 It seems unlikely you are using up much memory in that loop. 您似乎不太可能在该循环中耗尽大量内存。 Unless of course ROWOUT is actually a reference to a variable in memory, but we don't know that if you don't provide a complete script. 当然除非ROWOUT实际上是对内存中变量的引用,但我们不知道如果你不提供完整的脚本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM