简体   繁体   English

Perl:%hash中的undef值-为什么?

[英]Perl: undef value in %hash — why?

Good afternoon. 下午好。 I am writing some keys and values into a %hash, but I keep getting an undef value that I can't seem to explain. 我正在将一些键和值写入%hash中,但是我一直在获得似乎无法解释的undef值。

my @maxent_unchanged = <FILE1>; 
close FILE1;
chomp (@maxent_unchanged);

my @NM;
my @max_score_unchanged;
foreach my $line(@maxent_unchanged) {

  if ($line =~ m/[a-z]/i) {
    push (@NM, $line);
  }
  else { 
    push (@max_score_unchanged, $line);
  }
}

my %max_unchanged;
my $i = 0;
foreach my $lines(@maxent_unchanged) {
  $max_unchanged{$NM[$i]} = $max_score_unchanged[$i]; ##maxent score for unchanged seq
  $i++;
}

To put into context, @maxent_unchanged alternates between @NM and @max_score_unchanged like this: 放在上下文中,@maxent_unchanged在@NM和@max_score_unchanged之间交替如下:

$VAR1 = 'TTAAGGCAGCCCACCCGCAGGCT        >       1       110740688       110740688       C       T       GCCTGGGCGGGGAGGGCTGTCACAGTGCCGGCAGCAGCCCTTAAGGCAGC[C]CACCCGCAGGCTGCCGAGCGCTACCTGTATTTCCCCAACTGGGCCATGGC splicing  splicing        SLC6A17:NM_001010898:exon12:c.1816-10C>T';
$VAR2 = '0.77';
$VAR3 = 'TTCTATCCTTTGTTTTACAGGAA        >       1       111857154       111857154       T       C       TTAAATGGAGGGAGTCCTGACTTTTGAAGTTTATCTGTTTCTATCCTTTG[T]TTTACAGGAACAGCCAGCTGAAAACTCTCCTGGCCATTGGAGGCTGGAAC splicing  splicing        CHIA:NM_201653:exon5:c.258-8T>C';
$VAR4 = '10.99';

Therefore it (@maxent_unchanged) has twice the number of lines of @NM and @max_score_unchanged. 因此,它(@maxent_unchanged)的行数是@NM和@max_score_unchanged的两倍。 I have checked this and it holds true. 我已经检查过了,它是正确的。

If I data dump @NM and @max_score_unchanged I get the same number of variables, but when I put these into a %hash, I get an extra key-value pair as shown by data dumping the hash. 如果我对@NM和@max_score_unchanged进行数据转储,则会得到相同数量的变量,但是当我将它们放入%hash时,会得到一个额外的键值对,如数据转储哈希所示。

$VAR1 = '';
$VAR2 = undef;
$VAR3 = 'TTTTATTAATTCCTTTGTAGAAC        >       6       144835040       144835040       T       C       TATCATCTTAAATATTTCATATGGTTATGTAAGCATTTTATTAATTCCTT[T]GTAGAACCATCAGAACCAGCTAGAAATATTTGATGGGAACGTGGCTCACA splicing  splicing        UTRN:NM_007124:exon35:c.4945-5T>C';
$VAR4 = '8.22';
$VAR5 = 'TCTTTTTTGGACATGTACAGAGC        >       10      97127462        97127462        C       A       AGGAGTCTCTGAAGAAATTTCCGGAGTAGGGCTGATGGCTGAGCTCTGTA[C]ATGTCCAAAAAAGAAAAAAAAGAAGAAAAAAATAATGTAGATGATTTATT splicing  splicing        SORBS1:NM_001034957:exon13:c.1024-6G>T,NM_001034955:exon21:c.1972-6G>T,NM_001034956:exon18:c.1459-6G>T,NM_006434:exon13:c.1024-6G>T,NM_015385:exon17:c.1420-6G>T,NM_001034954:exon21:c.1906-6G>T,NM_024991:exon17:c.1147-6G>T';
$VAR6 = '4.43';

My keys are unique, so I know that is not the issue. 我的密钥是唯一的,所以我知道这不是问题。 Any ideas why? 有什么想法吗?

Second, as I want to remove the empty hash key and value, how can I do this? 其次,由于要删除空的哈希键和值,该怎么办?

Many thanks for your patience and help in advance, E 非常感谢您的耐心配合和提前帮助,E

In this loop, you are iterating over @maxent_unchanged but you should be iterating over @max_score_unchanged . 在此循环中,您要遍历@maxent_unchanged但应该遍历@max_score_unchanged

foreach my $lines(@max_score_unchanged) {
  $max_unchanged{$NM[$i]} = $max_score_unchanged[$i]; ##maxent score for unchanged seq
  $i++;
}

@maxent_unchanged is what you loaded all your data into, so it has twice as many lines as @NM and @max_score_unchanged . @maxent_unchanged是您将所有数据加载到的内容,因此它的行数是@NM@max_score_unchanged

If you use strict; 如果use strict; and use warnings , you'll see this error when you run: use warnings ,您将在运行时看到此错误:

Use of uninitialized value within @NM in hash element at test.pl line 25, <DATA> line 4.
Use of uninitialized value within @NM in hash element at test.pl line 25, <DATA> line 4.

Which will point you to the right line. 这将指向正确的行。 You could add print "$i\\n"; 您可以添加print "$i\\n"; to that loop to see how many times it is going through, and compare it to the length of @NM and @max_score_unchanged . 到该循环以查看它经历了多少次,并将其与@NM@max_score_unchanged的长度进行比较。

I recommend you use proper indention in your code to make it much more readable. 我建议您在代码中使用适当的缩进以使其更具可读性。


Example: 例:

use strict;
use warnings;
use Data::Dumper;

my @maxent_unchanged = <DATA>;
chomp (@maxent_unchanged);

my @NM;
my @max_score_unchanged;

foreach my $line(@maxent_unchanged) {
    if ($line =~ m/[a-z]/i) {
        push (@NM, $line);
    }
    else { 
        push (@max_score_unchanged, $line);
    }
}

my %max_unchanged;
for (my $i = 0; $i < @max_score_unchanged; $i++ ) {
    $max_unchanged{$NM[$i]} = $max_score_unchanged[$i]; ##maxent score for unchanged seq
}

print Dumper \%max_unchanged;

__DATA__
TTAAGGCAGCCCACCCGCAGGCT        >       1       110740688       110740688       C       T       GCCTGGGCGGGGAGGGCTGTCACAGTGCCGGCAGCAGCCCTTAAGGCAGC[C]CACCCGCAGGCTGCCGAGCGCTACCTGTATTTCCCCAACTGGGCCATGGC splicing  splicing        SLC6A17:NM_001010898:exon12:c.1816-10C>T
0.77
TTCTATCCTTTGTTTTACAGGAA        >       1       111857154       111857154       T       C       TTAAATGGAGGGAGTCCTGACTTTTGAAGTTTATCTGTTTCTATCCTTTG[T]TTTACAGGAACAGCCAGCTGAAAACTCTCCTGGCCATTGGAGGCTGGAAC splicing  splicing        CHIA:NM_201653:exon5:c.258-8T>C
10.99

I also put in an example of how you can iterate with an index over a for loop, instead of using a foreach loop since you don't use $lines anywhere. 我还举了一个示例,说明如何在for循环中使用索引进行迭代,而不是使用foreach循环,因为您在任何地方都不使用$lines


Output: 输出:

$VAR1 = {
          'TTAAGGCAGCCCACCCGCAGGCT        >       1       110740688       110740688       C       T       GCCTGGGCGGGGAGGGCTGTCACAGTGCCGGCAGCAGCCCTTAAGGCAGC[C]CACCCGCAGGCTGCCGAGCGCTACCTGTATTTCCCCAACTGGGCCATGGC splicing  splicing        SLC6A17:NM_001010898:exon12:c.1816-10C>T' => '0.77',
          'TTCTATCCTTTGTTTTACAGGAA        >       1       111857154       111857154       T       C       TTAAATGGAGGGAGTCCTGACTTTTGAAGTTTATCTGTTTCTATCCTTTG[T]TTTACAGGAACAGCCAGCTGAAAACTCTCCTGGCCATTGGAGGCTGGAAC splicing  splicing        CHIA:NM_201653:exon5:c.258-8T>C' => '10.99'
        };

Do you really need to copy the data into multiple arrays? 您是否真的需要将数据复制到多个阵列中? Are the being used elsewhere in the script. 在脚本的其他地方被使用。 If not, then I'd simply build the hash as I loop over the filehandle. 如果没有,那么我将在遍历文件句柄时简单地构建哈希。

use strict;
use warnings;
use Data::Dumper;

my %max_unchanged;

while (my $line = <DATA>) {
    chomp $line;
    if ($line =~ /^[ACGT]/) {
        chomp(my $value = <DATA>);
        $max_unchanged{$line} = $value;
    }
}

print Dumper \%max_unchanged;

__DATA__
TTAAGGCAGCCCACCCGCAGGCT        >       1       110740688       110740688       C       T       GCCTGGGCGGGGAGGGCTGTCACAGTGCCGGCAGCAGCCCTTAAGGCAGC[C]CACCCGCAGGCTGCCGAGCGCTACCTGTATTTCCCCAACTGGGCCATGGC splicing  splicing        SLC6A17:NM_001010898:exon12:c.1816-10C>T
0.77
TTCTATCCTTTGTTTTACAGGAA        >       1       111857154       111857154       T       C       TTAAATGGAGGGAGTCCTGACTTTTGAAGTTTATCTGTTTCTATCCTTTG[T]TTTACAGGAACAGCCAGCTGAAAACTCTCCTGGCCATTGGAGGCTGGAAC splicing  splicing        CHIA:NM_201653:exon5:c.258-8T>C
10.99

Matt has correctly pointed out the reason for your problem. 马特正确指出了您出现问题的原因。 In fact it would be better in this instance to iterate over a list of indices, like this 实际上,在这种情况下最好遍历索引列表 ,像这样

my %max_unchanged;
for my $i (0 .. $#max_score_unchanged) {
  $max_unchanged{$NM[$i]} = $max_score_unchanged[$i];
}

or you could even use map , like this 或者你甚至可以像这样使用map

my %max_unchanged = map {
  $NM[$_] => $max_score_unchanged[$_];
} 0 .. $#max_score_unchanged;

But in the end there is no clear reason to have split your file into two arrays, and you may prefer this more concise version of your program which achieves the same end. 但是最后,没有明确的理由将文件分成两个数组,您可能更喜欢程序更简洁的版本,可以达到相同的目的。 It expects the input file as a parameter on the command line. 它期望输入文件作为命令行上的参数。

use strict;
use warnings;

my %max_unchanged;
while (my $key = <>) {
  next unless $key =~ /[a-z]/;
  chomp $key;
  chomp($max_unchanged{$key} = <DATA>);
}

use Data::Dump;
dd \%max_unchanged;

Given your sample input data, %max_unchanged ends up looking like this 给定您的样本输入数据, %max_unchanged最终看起来像这样

{
  "TTAAGGCAGCCCACCCGCAGGCT        >       1       110740688       110740688       C       T       GCCTGGGCGGGGAGGGCTGTCACAGTGCCGGCAGCAGCCCTTAAGGCAGC[C]CACCCGCAGGCTGCCGAGCGCTACCTGTATTTCCCCAACTGGGCCATGGC splicing  splicing        SLC6A17:NM_001010898:exon12:c.1816-10C>T" => 0.77,
  "TTCTATCCTTTGTTTTACAGGAA        >       1       111857154       111857154       T       C       TTAAATGGAGGGAGTCCTGACTTTTGAAGTTTATCTGTTTCTATCCTTTG[T]TTTACAGGAACAGCCAGCTGAAAACTCTCCTGGCCATTGGAGGCTGGAAC splicing  splicing        CHIA:NM_201653:exon5:c.258-8T>C"          => 10.99,
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM