[英]How can I read and parse chunks of data into a Perl hash of arrays?
我有看起來像這樣的數據:
#info
#info2
1:SRX004541
Submitter: UT-MGS, UT-MGS
Study: Glossina morsitans transcript sequencing project(SRP000741)
Sample: Glossina morsitans(SRS002835)
Instrument: Illumina Genome Analyzer
Total: 1 run, 8.3M spots, 299.9M bases
Run #1: SRR016086, 8330172 spots, 299886192 bases
2:SRX004540
Submitter: UT-MGS
Study: Anopheles stephensi transcript sequencing project(SRP000747)
Sample: Anopheles stephensi(SRS002864)
Instrument: Solexa 1G Genome Analyzer
Total: 1 run, 8.4M spots, 401M bases
Run #1: SRR017875, 8354743 spots, 401027664 bases
3:SRX002521
Submitter: UT-MGS
Study: Massive transcriptional start site mapping of human cells under hypoxic conditions.(SRP000403)
Sample: Human DLD-1 tissue culture cell line(SRS001843)
Instrument: Solexa 1G Genome Analyzer
Total: 6 runs, 27.1M spots, 977M bases
Run #1: SRR013356, 4801519 spots, 172854684 bases
Run #2: SRR013357, 3603355 spots, 129720780 bases
Run #3: SRR013358, 3459692 spots, 124548912 bases
Run #4: SRR013360, 5219342 spots, 187896312 bases
Run #5: SRR013361, 5140152 spots, 185045472 bases
Run #6: SRR013370, 4916054 spots, 176977944 bases
我想做的是創建一個數組哈希,以每個塊的第一行作為鍵,並以“ ^ Run”作為其數組成員作為SR ##行的一部分:
$VAR = {
'SRX004541' => ['SRR016086'],
# etc
}
但是為什么我的構造不起作用。 而且它一定是更好的方法。
use Data::Dumper;
my %bighash;
my $head = "";
my @temp = ();
while ( <> ) {
chomp;
next if (/^\#/);
if ( /^\d{1,2}:(\w+)/ ) {
print "$1\n";
$head = $1;
}
elsif (/^Run \#\d+: (\w+),.*/){
print "\t$1\n";
push @temp, $1;
}
elsif (/^$/) {
push @{$bighash{$head}}, [@temp];
@temp =();
}
}
print Dumper \%bighash ;
進行這種解析的另一種方法是閱讀整個段落。 有關輸入記錄分隔符( $/
)的更多信息,請參見perlvar 。
例如:
use strict;
use warnings;
use Data::Dumper qw(Dumper);
my %bighash;
{
local $/ = "\n\n"; # Read entire paragraphs.
while (my $paragraph = <>){
# Filter out comments and handle extra blank lines between sections.
my @lines = grep {/\S/ and not /^\#/} split /\n/, $paragraph;
next unless @lines;
# Extract the key and the SRR* items.
my $key = $lines[0];
$key =~ s/^\d+://;
$bighash{$key} = [map { /^Run \#\d+: +(SRR\d+)/ ? $1 : () } @lines];
}
}
print Dumper(\%bighash);
更換
push @{$bighash{$head}}, [@temp];
與
push @{$bighash{$head}}, @temp;
每個$head
值只有一個數組,對嗎? 第二條語句將@temp
中的所有值@temp
到$bighash{$head}
的arrayref中。 第一種形式,在另一方面,構建一個數組引用了在項目@temp
和推動,為$bighash{$head}
,給你arrayrefs的數組引用。
或者,您可能想要
$bighash{$head} = [@temp];
如果只希望每個$head
值遇到一次。
根據您的代碼,這是一種實現方法
my $head;
my %result;
while (<>) {
chomp;
next if (/^\#/);
if ( /^\d{1,2}:(\w+)/ ) {
$result{$1} = [];
$head = $1; # $head will be used to know which key the following values
# will be assigned to
}
elsif (/^Run \#\d+: (\w+),.*/) {
push(@{$result{$head}},$1); #Add the number found to the array that is assigned to the
#last key found
}
}
您的狀態機出現問題,我認為您可以使用以下邏輯:
if(!$head) { # seek and get head } else { if (!$total) { # seek and get total } else { # seek run # if found : # push run to temp and decrease total # if total eq 0 : # push temp to bighash # reset head, total and temp } }
該代碼看起來正確,但我強烈建議添加:
use warnings
use strict
除了最瑣碎的一線外,
elsif ($head && /^$/) {
直到您遇到的最后一個問題。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.