[英]Perl: undef value in %hash — why?
Good afternoon. 下午好。 I am writing some keys and values into a %hash, but I keep getting an undef value that I can't seem to explain.
我正在将一些键和值写入%hash中,但是我一直在获得似乎无法解释的undef值。
my @maxent_unchanged = <FILE1>;
close FILE1;
chomp (@maxent_unchanged);
my @NM;
my @max_score_unchanged;
foreach my $line(@maxent_unchanged) {
if ($line =~ m/[a-z]/i) {
push (@NM, $line);
}
else {
push (@max_score_unchanged, $line);
}
}
my %max_unchanged;
my $i = 0;
foreach my $lines(@maxent_unchanged) {
$max_unchanged{$NM[$i]} = $max_score_unchanged[$i]; ##maxent score for unchanged seq
$i++;
}
To put into context, @maxent_unchanged alternates between @NM and @max_score_unchanged like this: 放在上下文中,@maxent_unchanged在@NM和@max_score_unchanged之间交替如下:
$VAR1 = 'TTAAGGCAGCCCACCCGCAGGCT > 1 110740688 110740688 C T GCCTGGGCGGGGAGGGCTGTCACAGTGCCGGCAGCAGCCCTTAAGGCAGC[C]CACCCGCAGGCTGCCGAGCGCTACCTGTATTTCCCCAACTGGGCCATGGC splicing splicing SLC6A17:NM_001010898:exon12:c.1816-10C>T';
$VAR2 = '0.77';
$VAR3 = 'TTCTATCCTTTGTTTTACAGGAA > 1 111857154 111857154 T C TTAAATGGAGGGAGTCCTGACTTTTGAAGTTTATCTGTTTCTATCCTTTG[T]TTTACAGGAACAGCCAGCTGAAAACTCTCCTGGCCATTGGAGGCTGGAAC splicing splicing CHIA:NM_201653:exon5:c.258-8T>C';
$VAR4 = '10.99';
Therefore it (@maxent_unchanged) has twice the number of lines of @NM and @max_score_unchanged. 因此,它(@maxent_unchanged)的行数是@NM和@max_score_unchanged的两倍。 I have checked this and it holds true.
我已经检查过了,它是正确的。
If I data dump @NM and @max_score_unchanged I get the same number of variables, but when I put these into a %hash, I get an extra key-value pair as shown by data dumping the hash. 如果我对@NM和@max_score_unchanged进行数据转储,则会得到相同数量的变量,但是当我将它们放入%hash时,会得到一个额外的键值对,如数据转储哈希所示。
$VAR1 = '';
$VAR2 = undef;
$VAR3 = 'TTTTATTAATTCCTTTGTAGAAC > 6 144835040 144835040 T C TATCATCTTAAATATTTCATATGGTTATGTAAGCATTTTATTAATTCCTT[T]GTAGAACCATCAGAACCAGCTAGAAATATTTGATGGGAACGTGGCTCACA splicing splicing UTRN:NM_007124:exon35:c.4945-5T>C';
$VAR4 = '8.22';
$VAR5 = 'TCTTTTTTGGACATGTACAGAGC > 10 97127462 97127462 C A AGGAGTCTCTGAAGAAATTTCCGGAGTAGGGCTGATGGCTGAGCTCTGTA[C]ATGTCCAAAAAAGAAAAAAAAGAAGAAAAAAATAATGTAGATGATTTATT splicing splicing SORBS1:NM_001034957:exon13:c.1024-6G>T,NM_001034955:exon21:c.1972-6G>T,NM_001034956:exon18:c.1459-6G>T,NM_006434:exon13:c.1024-6G>T,NM_015385:exon17:c.1420-6G>T,NM_001034954:exon21:c.1906-6G>T,NM_024991:exon17:c.1147-6G>T';
$VAR6 = '4.43';
My keys are unique, so I know that is not the issue. 我的密钥是唯一的,所以我知道这不是问题。 Any ideas why?
有什么想法吗?
Second, as I want to remove the empty hash key and value, how can I do this? 其次,由于要删除空的哈希键和值,该怎么办?
Many thanks for your patience and help in advance, E 非常感谢您的耐心配合和提前帮助,E
In this loop, you are iterating over @maxent_unchanged
but you should be iterating over @max_score_unchanged
. 在此循环中,您要遍历
@maxent_unchanged
但应该遍历@max_score_unchanged
。
foreach my $lines(@max_score_unchanged) {
$max_unchanged{$NM[$i]} = $max_score_unchanged[$i]; ##maxent score for unchanged seq
$i++;
}
@maxent_unchanged
is what you loaded all your data into, so it has twice as many lines as @NM
and @max_score_unchanged
. @maxent_unchanged
是您将所有数据加载到的内容,因此它的行数是@NM
和@max_score_unchanged
。
If you use strict;
如果
use strict;
and use warnings
, you'll see this error when you run: 并
use warnings
,您将在运行时看到此错误:
Use of uninitialized value within @NM in hash element at test.pl line 25, <DATA> line 4.
Use of uninitialized value within @NM in hash element at test.pl line 25, <DATA> line 4.
Which will point you to the right line. 这将指向正确的行。 You could add
print "$i\\n";
您可以添加
print "$i\\n";
to that loop to see how many times it is going through, and compare it to the length of @NM
and @max_score_unchanged
. 到该循环以查看它经历了多少次,并将其与
@NM
和@max_score_unchanged
的长度进行比较。
I recommend you use proper indention in your code to make it much more readable. 我建议您在代码中使用适当的缩进以使其更具可读性。
use strict;
use warnings;
use Data::Dumper;
my @maxent_unchanged = <DATA>;
chomp (@maxent_unchanged);
my @NM;
my @max_score_unchanged;
foreach my $line(@maxent_unchanged) {
if ($line =~ m/[a-z]/i) {
push (@NM, $line);
}
else {
push (@max_score_unchanged, $line);
}
}
my %max_unchanged;
for (my $i = 0; $i < @max_score_unchanged; $i++ ) {
$max_unchanged{$NM[$i]} = $max_score_unchanged[$i]; ##maxent score for unchanged seq
}
print Dumper \%max_unchanged;
__DATA__
TTAAGGCAGCCCACCCGCAGGCT > 1 110740688 110740688 C T GCCTGGGCGGGGAGGGCTGTCACAGTGCCGGCAGCAGCCCTTAAGGCAGC[C]CACCCGCAGGCTGCCGAGCGCTACCTGTATTTCCCCAACTGGGCCATGGC splicing splicing SLC6A17:NM_001010898:exon12:c.1816-10C>T
0.77
TTCTATCCTTTGTTTTACAGGAA > 1 111857154 111857154 T C TTAAATGGAGGGAGTCCTGACTTTTGAAGTTTATCTGTTTCTATCCTTTG[T]TTTACAGGAACAGCCAGCTGAAAACTCTCCTGGCCATTGGAGGCTGGAAC splicing splicing CHIA:NM_201653:exon5:c.258-8T>C
10.99
I also put in an example of how you can iterate with an index over a for
loop, instead of using a foreach
loop since you don't use $lines
anywhere. 我还举了一个示例,说明如何在
for
循环中使用索引进行迭代,而不是使用foreach
循环,因为您在任何地方都不使用$lines
。
Output: 输出:
$VAR1 = {
'TTAAGGCAGCCCACCCGCAGGCT > 1 110740688 110740688 C T GCCTGGGCGGGGAGGGCTGTCACAGTGCCGGCAGCAGCCCTTAAGGCAGC[C]CACCCGCAGGCTGCCGAGCGCTACCTGTATTTCCCCAACTGGGCCATGGC splicing splicing SLC6A17:NM_001010898:exon12:c.1816-10C>T' => '0.77',
'TTCTATCCTTTGTTTTACAGGAA > 1 111857154 111857154 T C TTAAATGGAGGGAGTCCTGACTTTTGAAGTTTATCTGTTTCTATCCTTTG[T]TTTACAGGAACAGCCAGCTGAAAACTCTCCTGGCCATTGGAGGCTGGAAC splicing splicing CHIA:NM_201653:exon5:c.258-8T>C' => '10.99'
};
Do you really need to copy the data into multiple arrays? 您是否真的需要将数据复制到多个阵列中? Are the being used elsewhere in the script.
在脚本的其他地方被使用。 If not, then I'd simply build the hash as I loop over the filehandle.
如果没有,那么我将在遍历文件句柄时简单地构建哈希。
use strict;
use warnings;
use Data::Dumper;
my %max_unchanged;
while (my $line = <DATA>) {
chomp $line;
if ($line =~ /^[ACGT]/) {
chomp(my $value = <DATA>);
$max_unchanged{$line} = $value;
}
}
print Dumper \%max_unchanged;
__DATA__
TTAAGGCAGCCCACCCGCAGGCT > 1 110740688 110740688 C T GCCTGGGCGGGGAGGGCTGTCACAGTGCCGGCAGCAGCCCTTAAGGCAGC[C]CACCCGCAGGCTGCCGAGCGCTACCTGTATTTCCCCAACTGGGCCATGGC splicing splicing SLC6A17:NM_001010898:exon12:c.1816-10C>T
0.77
TTCTATCCTTTGTTTTACAGGAA > 1 111857154 111857154 T C TTAAATGGAGGGAGTCCTGACTTTTGAAGTTTATCTGTTTCTATCCTTTG[T]TTTACAGGAACAGCCAGCTGAAAACTCTCCTGGCCATTGGAGGCTGGAAC splicing splicing CHIA:NM_201653:exon5:c.258-8T>C
10.99
Matt has correctly pointed out the reason for your problem. 马特正确指出了您出现问题的原因。 In fact it would be better in this instance to iterate over a list of indices, like this
实际上,在这种情况下最好遍历索引列表 ,像这样
my %max_unchanged;
for my $i (0 .. $#max_score_unchanged) {
$max_unchanged{$NM[$i]} = $max_score_unchanged[$i];
}
or you could even use map
, like this 或者你甚至可以像这样使用
map
my %max_unchanged = map {
$NM[$_] => $max_score_unchanged[$_];
} 0 .. $#max_score_unchanged;
But in the end there is no clear reason to have split your file into two arrays, and you may prefer this more concise version of your program which achieves the same end. 但是最后,没有明确的理由将文件分成两个数组,您可能更喜欢程序更简洁的版本,可以达到相同的目的。 It expects the input file as a parameter on the command line.
它期望输入文件作为命令行上的参数。
use strict;
use warnings;
my %max_unchanged;
while (my $key = <>) {
next unless $key =~ /[a-z]/;
chomp $key;
chomp($max_unchanged{$key} = <DATA>);
}
use Data::Dump;
dd \%max_unchanged;
Given your sample input data, %max_unchanged
ends up looking like this 给定您的样本输入数据,
%max_unchanged
最终看起来像这样
{
"TTAAGGCAGCCCACCCGCAGGCT > 1 110740688 110740688 C T GCCTGGGCGGGGAGGGCTGTCACAGTGCCGGCAGCAGCCCTTAAGGCAGC[C]CACCCGCAGGCTGCCGAGCGCTACCTGTATTTCCCCAACTGGGCCATGGC splicing splicing SLC6A17:NM_001010898:exon12:c.1816-10C>T" => 0.77,
"TTCTATCCTTTGTTTTACAGGAA > 1 111857154 111857154 T C TTAAATGGAGGGAGTCCTGACTTTTGAAGTTTATCTGTTTCTATCCTTTG[T]TTTACAGGAACAGCCAGCTGAAAACTCTCCTGGCCATTGGAGGCTGGAAC splicing splicing CHIA:NM_201653:exon5:c.258-8T>C" => 10.99,
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.