簡體   English   中英

使用Perl在數據結構中存儲表

[英]Storing table in data structure with perl

我正在考慮如何將下表存儲到復雜的數據結構中,以及使用哪種數據結構。 輸入是從Excel派生的制表符分隔的文本文件。 請注意,某些單元格為空(在這種情況下為“ RQ Max”)。 這是桌子:

Well    Sample Name Target Name RQ Max  Ct Mean
1   Sample 1    actin       20,514
2   Sample 1    claudin     30,544
3   Sample 1    occludin        31,183
25  Sample 1    actin       20,514
26  Sample 1    claudin     30,544
27  Sample 1    occludin        31,183
49  Sample 2    actin       20,416
50  Sample 2    claudin     25,611
51  Sample 2    occludin        27,831
73  Sample 2    actin       20,416
74  Sample 2    claudin     25,611
75  Sample 2    occludin        27,831
97  Sample 3    actin       24,213
98  Sample 3    claudin     32,065
99  Sample 3    occludin        34,556
194 H2O claudin     
195 H2O occludin        
217 H2O actin       
218 H2O claudin     
219 H2O occludin 

這是我的代碼:

#! usr/bin/perl
use strict;
use warnings;


# CHECK FOR CORRECT USAGE
unless (@ARGV == 1){
    die "Usage: perl $0 \"file.txt\"\n";
}

my $input = "$ARGV[0]";
#chomp ($input);

open (READ, $input) || die "Cannot open $input: $!\n";

my $line = '';
my %data;
while ($line = <READ>){
    chomp $line;
    if ($line =~ m/^[0-9]/i);
        $i++;
        $data{"$i"} = [ split /\t{1}/, $line ];
    }
}

如您所見,由於我不確定要使用哪種結構,因此我只是程序的開始。 實際上,我只需要整個表的三列,即“樣本名稱”,“目標名稱”和“ Ct均值”。 稍后我想為每個樣本計算一些內容時,將它們作為鍵可能會有所幫助。 在哈希結構的哈希中,我想將目標名稱作為“第二個鍵”。 有人可以將我推向正確的方向嗎? 我目前正在努力存儲數據,因為我已經很長時間沒有使用過perl了...

這是我最后想要的:

%data = (
            Sample 1 => {
                actin       => 20.514,
                claudin     => 30.544,
                occludin    => 31.183,
            },
            Sample 2 => {
                    actin       => 20.416,
                    claudin     => 25.611,
                    occludin    => 27.831,
                },
                ...
);

因此,如果要從命令行中指定的文件中讀取文件,則有幾點要點:

while ( <> ) {

后者可以讀取STDIN 在命令行上指定的文件。 確切地說,您將獲得sed / grep。

第二步-您可以使用哈希切片來分析制表符分隔的日期。

因此,假設您正在考慮僅提取CT_Mean:

#!/usr/bin/env perl

use strict;
use warnings;

use Data::Dumper;

my %results; 

#read header row
chomp ( my @header = split /\t/, <> ); 
#tidy up leading whitespace in the fields (there's some in your example data)
s/^\s+// for @header;
#iterate the rest of STDIN or files on command line. 
while ( <> ) {
   #remove trailing linefeed. 
   chomp;
   #tidy up leading whitespace again. 
   s/^\s+//g;

   my %row;
   #use hash slice to read key-value. 
   @row{@header} = split /\t/;
   #print for debug
   print Dumper \%row;

   #skip the H2O lines. 
   next if $row{'Sample Name'} eq 'H2O';

   #Cosmetic assignments - could rewrite to a single one
   my $sample_name = $row{'Sample Name'};
   my $ct_mean = $row{'Ct Mean'};
   my $target_name = $row{'Target Name'};

   $results{$sample_name}{$target_name} = $ct_mean; 
}

print Dumper \%results;

給你:

$VAR1 = {
          'Sample 2' => {
                          'occludin' => '27,831',
                          'actin' => '20,416',
                          'claudin' => '25,611'
                        },
          'Sample 3' => {
                          'occludin' => '34,556',
                          'actin' => '24,213',
                          'claudin' => '32,065'
                        },
          'Sample 1' => {
                          'claudin' => '30,544',
                          'occludin' => '31,183',
                          'actin' => '20,514'
                        }
        };

(注意-哈希明顯是無序的)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM