简体   繁体   中英

How do I read a file in order, using Perl?

I have the following code. It works fine, but the output isn't the same order as the input file. eg I have a list of proteins in my input FASTA file. My output file runs my code fine, but the order of the proteins seem random.

What am I missing?

#!/usr/bin/perl
#usage: perl seqComp.pl <input_fasta_file> > <output_file>

use strict;

open( S, "$ARGV[0]" ) || die "cannot open FASTA file to read: $!";

my %s;      # a hash of arrays, to hold each line of sequence
my %seq;    #a hash to hold the AA sequences.
my $key;

while (<S>) {    #Read the FASTA file.
    chomp;
    if (/>/) {
        s/>//;
        $key = $_;
    } else {
        push( @{ $s{$key} }, $_ );
    }
}

foreach my $a ( keys %s ) {
    my $s = join( "", @{ $s{$a} } );
    $seq{$a} = $s;
    #print("$a\t$s\n");
}

my @aa = qw(A R N D C Q E G H I L K M F P S T W Y V);
my $aa = join( "\t", @aa );
#print ("Sequence\t$aa\n");

foreach my $k ( keys %seq ) {
    my %count;    # a hash to hold the count for each amino acid in the protein
    my @seq = split( //, $seq{$k} );
    foreach my $r (@seq) {
        $count{$r}++;
    }
    my @row;
    push( @row, ">" . $k );
    foreach my $a (@aa) {
        $count{$a} ||= 0;
        my $percentAA = sprintf( "%0.2f", $count{$a} / length( $seq{$k} ) );
        push( @row,
            $a . ":" . $count{$a} . "/" . length( $seq{$k} ) . "=" . sprintf( "%0.0f", $percentAA * 100 ) . "%" );
        $count{$a} = sprintf( "%0.2f", $count{$a} / length( $seq{$k} ) );

        # push(@row,$count{$a});
    }
    my $row = join( "\t\n", @row );
    print("$row\n\n");
}

%seq这样的hash没有特定的顺序。

Arrays preserve order, hashes are in random order. If you want to preserve the order, you can push the keys onto an array, but only do it if the key does not exist in the hash or you get duplicates.

for(<S>) {
  my ($key,$value) = &parse($_);
  push @keys, $key unless exists $hash{$key};
  $hash{$key} = $value;
}

for my $key (@keys) {
  my $value = $hash{$key};

  ...
}

Don't use a hash if order is important.

Instead I recommend using an array of arrays like the following:

#!/usr/bin/perl
#usage: perl seqComp.pl <input_fasta_file> > <output_file>
use strict;
use warnings;
use autodie;

my $file = shift or die "Usage: perl $0 <input_fasta_file> > <output_file>";
open my $fh, '<', $file;

my @fasta;

while (<$fh>) {    #Read the FASTA file.
    chomp;
    if (/>/) {
        push @fasta, [ $_, '' ];
    } else {
        $fasta[-1][1] .= $_;
    }
}

my @aa = qw(A R N D C Q E G H I L K M F P S T W Y V);

for (@fasta) {
    my ( $k, $seq ) = @$_;

    print "$k\n";

    my %count;    # a hash to hold the count for each amino acid in the protein
    $count{$_}++ for split '', $seq;

    for my $a (@aa) {
        $count{$a} ||= 0;
        printf "%s:%s/%s=%.0f%%\n", $a, $count{$a}, length($seq), 100 * $count{$a} / length($seq);
    }

    print "\n";
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM