[英]How do I read a file in order, using Perl?
我有以下代碼。 它工作正常,但輸出與輸入文件的順序不同。 例如,我輸入的FASTA文件中有蛋白質列表。 我的輸出文件可以很好地運行我的代碼,但是蛋白質的順序似乎是隨機的。
我想念什么?
#!/usr/bin/perl
#usage: perl seqComp.pl <input_fasta_file> > <output_file>
use strict;
open( S, "$ARGV[0]" ) || die "cannot open FASTA file to read: $!";
my %s; # a hash of arrays, to hold each line of sequence
my %seq; #a hash to hold the AA sequences.
my $key;
while (<S>) { #Read the FASTA file.
chomp;
if (/>/) {
s/>//;
$key = $_;
} else {
push( @{ $s{$key} }, $_ );
}
}
foreach my $a ( keys %s ) {
my $s = join( "", @{ $s{$a} } );
$seq{$a} = $s;
#print("$a\t$s\n");
}
my @aa = qw(A R N D C Q E G H I L K M F P S T W Y V);
my $aa = join( "\t", @aa );
#print ("Sequence\t$aa\n");
foreach my $k ( keys %seq ) {
my %count; # a hash to hold the count for each amino acid in the protein
my @seq = split( //, $seq{$k} );
foreach my $r (@seq) {
$count{$r}++;
}
my @row;
push( @row, ">" . $k );
foreach my $a (@aa) {
$count{$a} ||= 0;
my $percentAA = sprintf( "%0.2f", $count{$a} / length( $seq{$k} ) );
push( @row,
$a . ":" . $count{$a} . "/" . length( $seq{$k} ) . "=" . sprintf( "%0.0f", $percentAA * 100 ) . "%" );
$count{$a} = sprintf( "%0.2f", $count{$a} / length( $seq{$k} ) );
# push(@row,$count{$a});
}
my $row = join( "\t\n", @row );
print("$row\n\n");
}
像%seq
這樣的hash
沒有特定的順序。
數組保留順序,哈希按隨機順序排列。 如果要保留順序,則可以將鍵推到數組上,但是僅當鍵在哈希中不存在或得到重復項時才這樣做。
for(<S>) {
my ($key,$value) = &parse($_);
push @keys, $key unless exists $hash{$key};
$hash{$key} = $value;
}
for my $key (@keys) {
my $value = $hash{$key};
...
}
如果順序很重要,請不要使用哈希。
相反,我建議使用如下所示的數組數組:
#!/usr/bin/perl
#usage: perl seqComp.pl <input_fasta_file> > <output_file>
use strict;
use warnings;
use autodie;
my $file = shift or die "Usage: perl $0 <input_fasta_file> > <output_file>";
open my $fh, '<', $file;
my @fasta;
while (<$fh>) { #Read the FASTA file.
chomp;
if (/>/) {
push @fasta, [ $_, '' ];
} else {
$fasta[-1][1] .= $_;
}
}
my @aa = qw(A R N D C Q E G H I L K M F P S T W Y V);
for (@fasta) {
my ( $k, $seq ) = @$_;
print "$k\n";
my %count; # a hash to hold the count for each amino acid in the protein
$count{$_}++ for split '', $seq;
for my $a (@aa) {
$count{$a} ||= 0;
printf "%s:%s/%s=%.0f%%\n", $a, $count{$a}, length($seq), 100 * $count{$a} / length($seq);
}
print "\n";
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.