简体   繁体   中英

Identify a memory problem in Perl/DBI code

To start out with - this is not my code - there was a problem with the code and I'm trying to find out how to debug the problem. There are plenty of changes I would make to the code if I was given the chance (excessive braces, global variables, use the join function instead of foreach, etc., etc. etc.). It is full of bad practice, but that is not what I need help with.

Here is a snippet of the Perl code (There are no subroutines, nothing special - basically open a file for the results of the query, execute the query, and dump the results to a file):

# earlier in the program, @row, $field, and $output are all declared globally, like this:
my @row;
my $field;
my $output;

# a file is opened for output, with filehandle ROWOUT
# a database statement handle (DBD::DB2) is executed

while ( @{row} = ${sth}->fetchrow_array ) {
    foreach ${field}( @{row} ) {
        ${field} =~ s/\s+$//;
        ${output} = "${output}\~${field}";
    }

    ${output} =~ s/\~//;
    print ROWOUT "${output}\n";
    undef ${output};
}

Somewhere in the while loop, the Perl script is crashing with an Out of Memory! error (not a clean crash - it just stops running with that message.)

In most runs, the volume on this query is very small. The results of the query this time when the script crashed is a lot bigger (still not huge): 150,000 rows, and each row is about 1200 bytes wide.

Things that I have thought of:

  1. The fetchrow_array function of DBI is smart enough to not pull the complete dataset into memory, correct? My assumption is that the data is on the database, and fetchrow_array retrieves one row at a time, so that even if you had 10 billion rows, you should not have a memory problem - is that correct?
  2. Calling undef on the $output variable will free the memory that it was using, correct? If it doesn't, that could be another place where a memory problem could exist.
  3. The memory the @row variable is using will get re-used(?) each time a new row is retrieved, correct? If not, I could see how using a global array to store each row could blow out the memory.

I am hoping there is something obvious that I am just not understanding. If there is not something obvious by looking at the code, what are some techniques I can use to debug this problem?

Thanks in advance!

It might be that you're (perhaps inadvertently) caching too many rows. You can find out how many have been brought in by checking $sth->{RowsInCache} . If it's undef , then there is no cache, otherwise you'll be given the number of rows.

You can also get away from the gymnastics you're having to do with $output by rewriting it as follows:

while ( my @this_row = $sth->fetchrow_array ) {
    # Get rid of this line once you figure out your memory problem.
    print STDERR "Using ", ($sth->{RowsInCache} || 0), " rows in cache\n";

    print ROWOUT join('~', map { s/\s+$// } @this_row), "\n";
}

So, assuming you have too many rows in your cache, you can limit it via:

my $dbh = DBI->connect($dsn, $user, $pass, { RowCacheSize => 20 })
    or die "Cannot connect to $dsn: $DBI::errstr\n";

From the DBI documentation, you can control the cache (assuming your driver supports it) by using a value as follows:

 0 - Automatically determine a reasonable cache size for each C<SELECT>
 1 - Disable the local row cache
>1 - Cache this many rows
<0 - Cache as many rows that will fit into this much memory for each C<SELECT>.

Increase trace level , and run the code under the Perl and GDB debuggers. You need to find out where exactly the process goes out of control.

If you are not running the latest version of the relevant modules and DB, consider the possibility that you have found an old bug that has already been fixed.

As far as #1 goes, I do believe it loads the entire result into memory Edit: I recall this being an option in DBI

For #2 and #3, you should really be localizing your variables to the scope they are used in.

I suspect you are actually running out of memory after your execute, though I know you said otherwise. It seems unlikely you are using up much memory in that loop. Unless of course ROWOUT is actually a reference to a variable in memory, but we don't know that if you don't provide a complete script.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM