如何讀取大塊數據並將其解析為Perl數組散列？

Question

我有看起來像這樣的數據：

#info
#info2

1:SRX004541
Submitter: UT-MGS, UT-MGS
Study: Glossina morsitans transcript sequencing project(SRP000741)
Sample: Glossina morsitans(SRS002835)
Instrument: Illumina Genome Analyzer
Total: 1 run, 8.3M spots, 299.9M bases
Run #1: SRR016086, 8330172 spots, 299886192 bases

2:SRX004540
Submitter: UT-MGS
Study: Anopheles stephensi transcript sequencing project(SRP000747)
Sample: Anopheles stephensi(SRS002864)
Instrument: Solexa 1G Genome Analyzer
Total: 1 run, 8.4M spots, 401M bases
Run #1: SRR017875, 8354743 spots, 401027664 bases

3:SRX002521
Submitter: UT-MGS
Study: Massive transcriptional start site mapping of human cells under hypoxic conditions.(SRP000403)
Sample: Human DLD-1  tissue culture cell line(SRS001843)
Instrument: Solexa 1G Genome Analyzer
Total: 6 runs, 27.1M spots, 977M bases
Run #1: SRR013356, 4801519 spots, 172854684 bases
Run #2: SRR013357, 3603355 spots, 129720780 bases
Run #3: SRR013358, 3459692 spots, 124548912 bases
Run #4: SRR013360, 5219342 spots, 187896312 bases
Run #5: SRR013361, 5140152 spots, 185045472 bases
Run #6: SRR013370, 4916054 spots, 176977944 bases

我想做的是創建一個數組哈希，以每個塊的第一行作為鍵，並以“ ^ Run”作為其數組成員作為SR ##行的一部分：

$VAR = {
     'SRX004541' => ['SRR016086'], 
     # etc
}

但是為什么我的構造不起作用。 而且它一定是更好的方法。

use Data::Dumper;
my %bighash;
my $head = "";
my @temp = ();

while ( <> ) {
    chomp;
    next if (/^\#/);


    if ( /^\d{1,2}:(\w+)/ ) { 
print "$1\n";
      $head = $1;


    }
    elsif (/^Run \#\d+: (\w+),.*/){ 
print "\t$1\n";
      push @temp, $1;
    }
    elsif (/^$/) {
         push @{$bighash{$head}}, [@temp];
         @temp =();
    }

}               


print Dumper \%bighash ;

Answer 1

進行這種解析的另一種方法是閱讀整個段落。 有關輸入記錄分隔符（ $/ ）的更多信息，請參見perlvar 。

例如：

use strict;
use warnings;
use Data::Dumper qw(Dumper);
my %bighash;

{
    local $/ = "\n\n"; # Read entire paragraphs.
    while (my $paragraph = <>){
        # Filter out comments and handle extra blank lines between sections.
        my @lines = grep {/\S/ and not /^\#/} split /\n/, $paragraph;
        next unless @lines;

        # Extract the key and the SRR* items.
        my $key = $lines[0];
        $key =~ s/^\d+://;
        $bighash{$key} = [map { /^Run \#\d+: +(SRR\d+)/ ? $1 : () } @lines];
    }
}

print Dumper(\%bighash);

Answer 2

更換

push @{$bighash{$head}}, [@temp];

與

push @{$bighash{$head}}, @temp;

每個$head值只有一個數組，對嗎？ 第二條語句將@temp中的所有值@temp到$bighash{$head}的arrayref中。 第一種形式，在另一方面，構建一個數組引用了在項目@temp和推動，為$bighash{$head} ，給你arrayrefs的數組引用。

或者，您可能想要

$bighash{$head} = [@temp];

如果只希望每個$head值遇到一次。

Answer 3

根據您的代碼，這是一種實現方法

my $head;
my %result;
while (<>) {
    chomp;
    next if (/^\#/);

    if ( /^\d{1,2}:(\w+)/ ) {
        $result{$1} = []; 
        $head = $1; # $head will be used to know which key the following values
                    # will be assigned to
    }
    elsif (/^Run \#\d+: (\w+),.*/) {
        push(@{$result{$head}},$1); #Add the number found to the array that is assigned to the                        
                                    #last key found
    } 
}

Answer 4

您的狀態機出現問題，我認為您可以使用以下邏輯：

if(!$head)
{
  # seek and get head
} 
else
{
  if (!$total) 
  {
    # seek and get total
  }
  else
  {
    # seek run
    # if found :
      # push run to temp and decrease total
      # if total eq 0 :
        # push temp to bighash
        # reset head, total and temp
  }
}

Answer 5

該代碼看起來正確，但我強烈建議添加：

use warnings
use strict

除了最瑣碎的一線外，

 elsif ($head && /^$/) {

直到您遇到的最后一個問題。

如何讀取大塊數據並將其解析為Perl數組散列？

問題描述

5 個解決方案

解決方案1
5 已采納 2010-04-16 12:07:39

解決方案2
2 2010-04-16 11:04:53

解決方案3
1 2010-04-16 11:02:20

解決方案4
0 2010-04-16 06:43:55

解決方案5
0 2010-04-16 06:52:18

如何讀取大塊數據並將其解析為Perl數組散列？

問題描述

5 個解決方案

解決方案1 5 已采納 2010-04-16 12:07:39

解決方案2 2 2010-04-16 11:04:53

解決方案3 1 2010-04-16 11:02:20

解決方案4 0 2010-04-16 06:43:55

解決方案5 0 2010-04-16 06:52:18

解決方案1
5 已采納 2010-04-16 12:07:39

解決方案2
2 2010-04-16 11:04:53

解決方案3
1 2010-04-16 11:02:20

解決方案4
0 2010-04-16 06:43:55

解決方案5
0 2010-04-16 06:52:18