简体   繁体   English

如何使用Perl顺序读取文件?

[英]How do I read a file in order, using Perl?

I have the following code. 我有以下代码。 It works fine, but the output isn't the same order as the input file. 它工作正常,但输出与输入文件的顺序不同。 eg I have a list of proteins in my input FASTA file. 例如,我输入的FASTA文件中有蛋白质列表。 My output file runs my code fine, but the order of the proteins seem random. 我的输出文件可以很好地运行我的代码,但是蛋白质的顺序似乎是随机的。

What am I missing? 我想念什么?

#!/usr/bin/perl
#usage: perl seqComp.pl <input_fasta_file> > <output_file>

use strict;

open( S, "$ARGV[0]" ) || die "cannot open FASTA file to read: $!";

my %s;      # a hash of arrays, to hold each line of sequence
my %seq;    #a hash to hold the AA sequences.
my $key;

while (<S>) {    #Read the FASTA file.
    chomp;
    if (/>/) {
        s/>//;
        $key = $_;
    } else {
        push( @{ $s{$key} }, $_ );
    }
}

foreach my $a ( keys %s ) {
    my $s = join( "", @{ $s{$a} } );
    $seq{$a} = $s;
    #print("$a\t$s\n");
}

my @aa = qw(A R N D C Q E G H I L K M F P S T W Y V);
my $aa = join( "\t", @aa );
#print ("Sequence\t$aa\n");

foreach my $k ( keys %seq ) {
    my %count;    # a hash to hold the count for each amino acid in the protein
    my @seq = split( //, $seq{$k} );
    foreach my $r (@seq) {
        $count{$r}++;
    }
    my @row;
    push( @row, ">" . $k );
    foreach my $a (@aa) {
        $count{$a} ||= 0;
        my $percentAA = sprintf( "%0.2f", $count{$a} / length( $seq{$k} ) );
        push( @row,
            $a . ":" . $count{$a} . "/" . length( $seq{$k} ) . "=" . sprintf( "%0.0f", $percentAA * 100 ) . "%" );
        $count{$a} = sprintf( "%0.2f", $count{$a} / length( $seq{$k} ) );

        # push(@row,$count{$a});
    }
    my $row = join( "\t\n", @row );
    print("$row\n\n");
}

%seq这样的hash没有特定的顺序。

Arrays preserve order, hashes are in random order. 数组保留顺序,哈希按随机顺序排列。 If you want to preserve the order, you can push the keys onto an array, but only do it if the key does not exist in the hash or you get duplicates. 如果要保留顺序,则可以将键推到数组上,但是仅当键在哈希中不存在或得到重复项时才这样做。

for(<S>) {
  my ($key,$value) = &parse($_);
  push @keys, $key unless exists $hash{$key};
  $hash{$key} = $value;
}

for my $key (@keys) {
  my $value = $hash{$key};

  ...
}

Don't use a hash if order is important. 如果顺序很重要,请不要使用哈希。

Instead I recommend using an array of arrays like the following: 相反,我建议使用如下所示的数组数组:

#!/usr/bin/perl
#usage: perl seqComp.pl <input_fasta_file> > <output_file>
use strict;
use warnings;
use autodie;

my $file = shift or die "Usage: perl $0 <input_fasta_file> > <output_file>";
open my $fh, '<', $file;

my @fasta;

while (<$fh>) {    #Read the FASTA file.
    chomp;
    if (/>/) {
        push @fasta, [ $_, '' ];
    } else {
        $fasta[-1][1] .= $_;
    }
}

my @aa = qw(A R N D C Q E G H I L K M F P S T W Y V);

for (@fasta) {
    my ( $k, $seq ) = @$_;

    print "$k\n";

    my %count;    # a hash to hold the count for each amino acid in the protein
    $count{$_}++ for split '', $seq;

    for my $a (@aa) {
        $count{$a} ||= 0;
        printf "%s:%s/%s=%.0f%%\n", $a, $count{$a}, length($seq), 100 * $count{$a} / length($seq);
    }

    print "\n";
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM