简体   繁体   中英

Why does my Perl for loop exit early?

I am trying to get a perl loop to work that is working from an array that contains 6 elements. I want the loop to pull out two elements from the array, perform certain functions, and then loop back and pull out the next two elements from the array until the array runs out of elements. Problem is that the loop only pulls out the first two elements and then stops. Some help here would be greatly apperaciated.

my open(infile, 'dnadata.txt');
my @data = < infile>;
chomp @data;
#print @data; #Debug

my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
my $aalen = length($aminoacids);

my $i=0;
my $j=0;
my @matrix =();
for(my $i=0; $i<2; $i++){
    for( my $j=0; $j<$aalen; $j++){
    $matrix[$i][$j] = 0;

    }
}

The guidelines for this program states that the program should ignore the presence of gaps in the program. which means that DNA code that is matched up with a gap should be ignored. So the code that is pushed through needs to have alignments linked with gaps removed.

I need to modify the length of the array by two since I am comparing two sequence in this part of the loop.

#$lemseqcomp = $lenarray / 2;
#print $lenseqcomp;
#I need to initialize these saclar values.
$junk1 = " ";
$junk2 = " ";
$seq1 = " ";
$seq2 = " ";

This is the loop that is causeing issues. I belive that the first loop should move back to the array and pull out the next element each time it loops but it doesn't.

for($i=0; $i<$lenarray; $i++){

    #This code should remove the the last value of the array once and 
    #then a second time. The sequences should be the same length at this point. 
my $last1 =pop(@data1);
my $last2 =pop(@data1);
for($i=0; $i<length($last1); $i++){
my $letter1 = substr($last1, $i, 1);
my $letter2 = substr($last2, $i, 1);
    if(($letter1 eq '-')|| ($letter2 eq '-')){ 
    #I need to put the sequences I am getting rid of somewhere. Here is a good place as any. 
    $junk1 = $letter1 . $junk1;
    $junk2 = $letter1 . $junk2;
    }
    else{
    $seq1 = $letter1 . $seq1;
    $seq2 = $letter2 . $seq2;

    }   
}
}
print "$seq1\n";
print "$seq2\n";
print "@data1\n";

I am actually trying to create a substitution matrix from scratch and return the data. The reason why the code looks weird, is because it isn't actually finished yet and I got stuck. This is the test sequence if anyone is curious.

YFRFR
YF-FR
FRFRFR
ARFRFR
YFYFR-F
YFRFRYF

First off, if you're going to work with sequence data, use BioPerl . Life will be so much easier. However...

Since you know you'll be comparing the lines from your input file as pairs, it makes sense to read them into a datastructure that reflects that. As elsewhere suggested, an array like @data[[line1, line2],[line3,line4]) ensures that the correct pairs of lines are always together.

What I'm not clear on what you're trying to do is:

  • a) are you generating a consensus sequence where the 2 sequences are difference only by gaps
  • b) are your 2 sequences significantly different and you're trying to exclude the non-aligning parts and then generate a consensus?

So, does the first pair represent your data, or is it more like the second?

ATCG---AAActctgGGGGG--taGC
ATCGcccAAActctgGGGGGTTtaGC

ATCG---AAActctgGGGGG--taGCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
ATCGcccAAActctgGGGGGTTtaGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

The problem is that you're using $i as the counter variable for both your loops, so the inner loop modifies the counter out from under the outer loop. Try changing the inner loop's counter to $j , or using my to localize them properly.

Don't store your values as an array, store as a two-dimensional array:

my @dataset = ([$val1, $val2], [$val3, $val4]);

or

my @dataset;
push (@dataset, [$val_n1, $val_n2]);

Then:

for my $value (@dataset) {
 ### Do stuff with $value->[0] and $value->[1]
}

There are lots of strange things in your code: you are initializing a matrix then not using it; reading a whole file into an array; scanning a string C style but then not doing anything with the unmatched values; and finally, just printing the two last processed values (which, in your case, are the two first elements of your array, since you are using pop.)

Here's a guess.

use strict;
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';

# Preparing a regular expression. This is kind of useful if processing large
# amounts of data. This will match anything that is not in the string above.
my $regex = qr([^$aminoacids]);

# Our work function. 
sub do_something {
    my ($a, $b) = @_;
    $a =~ s/$regex//g; # removing unwanted characters
    $b =~ s/$regex//g; # ditto
    # Printing, saving, whatever...
    print "Something: $a - $b\n";

    return ($a, $b);
}

my $prev;
while (<>) {
    chomp;
    if ($prev) {
        do_something($prev, $_);
        $prev = undef;
    } else {
        $prev = $_;
    }
}

print STDERR "Warning: trailing data: $prev\n"
    if $prev;

Since you are a total Perl/programming newbie, I am going to show a rewrite of your first code block, then I'll offer you some general advice and links.

Let's look at your first block of sample code. There is a lot of stuff all strung together, and it's hard to follow. I, personally, am too dumb to remember more than a few things at a time, so I chop problems into small pieces that I can understand. This is (was) known as 'chunking'.

One easy way to chunk your program is use write subroutines. Take any particular action or idea that is likely to be repeated or would make the current section of code long and hard to understand, and wrap it up into a nice neat package and get it out of the way.

It also helps if you add space to your code to make it easier to read. Your mind is already struggling to grok the code soup, why make things harder than necessary? Grouping like things, using _ in names, blank lines and indentation all help. There are also conventions that can help, like making constant values (values that cannot or should not change) all capital letters.

use strict;      # Using strict will help catch errors.
use warnings;    # ditto for warnings.
use diagnostics; # diagnostics will help you understand the error messages

# Put constants at the top of your program.
# It makes them easy to find, and change as needed.

my $AMINO_ACIDS = 'ARNDCQEGHILKMFPSTWYV';
my $AMINO_COUNT = length($AMINO_ACIDS);

my $DATA_FILE = 'dnadata.txt';

# Here I am using subroutines to encapsulate complexity:

my @data = read_data_file( $DATA_FILE );
my @matrix = initialize_matrix( 2, $amino_count, 0 );

# now we are done with the first block of code and can do more stuff

...

# This section down here looks kind of big, but it is mostly comments.
# Remove the didactic comments and suddenly the code is much more compact.

# Here are the actual subs that I abstracted out above.  
# It helps to document your subs:
#  - what they do
#  - what arguments they take
#  - what they return

# Read a data file and returns an array of dna strings read from the file.
# 
# Arguments
#   data_file => path to the data file to read

sub read_data_file {
    my $data_file = shift;

    # Here I am using a 3 argument open, and a lexical filehandle.
    open( my $infile, '<', $data_file )
         or die "Unable to open dnadata.txt - $!\n";

    # I've left slurping the whole file intact, even though it can be very inefficient.
    # Other times it is just what the doctor ordered.
    my @data = <$infile>;
    chomp @data;

    # I return the data array rather than a reference
    # to keep things simple since you are just learning.
    #
    # In my code, I'd pass a reference.

    return @data;
}

# Initialize a matrix (or 2-d array) with a specified value.
# 
# Arguments
#    $i     => width of matrix
#    $j     => height of matrix
#    $value => initial value

sub initialize_matrix {
    my $i     = shift;
    my $j     = shift;
    my $value = shift;

    # I use two powerful perlisms here:  map and the range operator.
    #
    # map is a list contsruction function that is very very powerful.
    # it calls the code in brackets for each member of the the list it operates against.
    # Think of it as a for loop that keeps the result of each iteration, 
    # and then builds an array out of the results.
    #
    # The range operator `..` creates a list of intervening values. For example:
    #     (1..5) is the same as (1, 2, 3, 4, 5)

    my @matrix = map {
        [ ($value) x $i ]
    } 1..$j;

    # So here we make a list of numbers from 1 to $j.
    # For each member of the list we
    #     create an anonymous array containing a list of $i copies of $value.
    # Then we add the anonymous array to the matrix.

    return @matrix;
}

Now that the code rewrite is done, here are some links:

Here's a response I wrote titled "How to write a program" . It offers some basic guidelines on how to approach writing software projects from specification. It is aimed at beginners. I hope you find it helpful. If nothing else, the links in it should be handy.

For a beginning programmer, beginning with Perl, there is no better book than Learning Perl .

I also recommend heading over to Perlmonks for Perl help and mentoring. It is an active Perl specific community site with very smart, friendly people who are happy to help you. Kind of like Stack Overflow, but more focused.

Good luck!

Instead of using a C-style for loop, you can read data from an array two elements at a time using splice inside a while loop:

while (my ($letter1, $letter2) = splice(@data, 0, 2))
{
    # stuff...
}

I've cleaned up some of your other code below:

use strict;
use warnings;
open(my $infile, '<', 'dnadata.txt');
my @data = <$infile>;
close $infile;

chomp @data;

my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
my $aalen = length($aminoacids);

# initialize a 2 x 21 array for holding the amino acid data
my $matrix;
foreach my $i (0 .. 1)
{
    foreach my $j (0 .. $aalen-1)
    {
        $matrix->[$i][$j] = 0;
    }
}

# Process all letters in the DNA data
while (my ($letter1, $letter2) = splice(@data, 0, 2))
{
    # do something... not sure what?
    # you appear to want to look up the letters in a reference table, perhaps $aminoacids?
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM