简体   繁体   English

当我尝试将散列(按引用)和变量传递给子项以在散列中打印相应值时,修复了Perl错误

[英]Fixing Perl errors when I try to pass a hash (by reference) AND a variable to a sub to print the corresponding value in the hash

I am banging my head over a Perl task in my Natural Language Processing course that we have been assigned to solve. 我正在努力完成分配给我们的“自然语言处理”课程中的Perl任务。

What they require us to be able to solve with Perl is the following: 他们要求我们使用Perl解决的问题如下:

  • Input: the program takes two inputs from stdin in the form and type of; 输入:程序从stdin中以形式和类型接收两个输入; perl program.pl perl program.pl

  • Processing and Output: 处理和输出:

    Part 1: the program tokenizes words in filename.txt and stores these words in a hash with their frequency of occurrence 第1部分:程序标记文件名.txt中的单词并将这些单词及其出现频率存储在哈希中

    Part 2: the program uses the input for hashing purposes. 第2部分:程序将输入用于哈希目的。 If the word cannot be found in the hash (thus in the text), prints out zero as the frequency of the word. 如果在散列中找不到该单词(因此在文本中),则将单词的出现频率打印为零。 If the word CAN indeed be found in the hash, prints out the corresponding frequency value of the word in the hash. 如果确实可以在哈希中找到单词,则在哈希中打印出单词的相应频率值。

I am sure from experience that my script is already able to DO "Part 1" stated above. 从经验中可以肯定,我的脚本已经可以执行上述的“第1部分”。

Part 2 needs to be accomplished using a Perl sub (subroutine) which takes the hash by reference, along with the to hash for. 第2部分需要使用Perl子程序(子例程)完成,该子程序通过引用获取哈希以及to哈希。 This was the part that I had some serious trouble with. 这是我遇到严重麻烦的部分。

First version before major changes Stefan Becker suggested; Stefan Becker建议进行重大变更之前的第一版;

#!/usr/bin/perl                                                                           

use warnings;
use strict;

sub hash_4Frequency
{
    my ($hashWord, $ref2_Hash) = @_;                       
    print $ref2_Hash -> {$hashWord}, "\n";  # thank you Stefan Becker, for sobriety
}

my %f = ();  # hash that will contain words and their frequencies                              
my $wc = 0;  # word-count                                       

my ($stdin, $word_2Hash) = @ARGV;  # corrected, thanks to Silvar

while ($stdin)
{
    while ("/\w+/")
    {
        my $w = $&;
        $_ = $";
        $f{lc $w} += 1;
        $wc++;
    }
}

my @args = ($word_2Hash, %f);
hash_4Frequency(@args);

The second version after some changes; 经过一些更改的第二个版本;

#!/usr/bin/perl

use warnings;
use strict;

sub hash_4Frequency
{
    my $ref2_Hash = %_;
    my $hashWord = $_;

    print $ref2_Hash -> {$hashWord}, "\n";
}

my %f = ();  # hash that will contain words and their frequencies
my $wc = 0;  # word-count

while (<STDIN>) 
{
    while (/\w+/)
    {
        chomp;
        my $w = $&;
        $_ = $";

        $f{$_}++ foreach keys %f;
        $wc++;
    }
}

hash_4Frequency($_, \%f);

When I execute ' ./script.pl < somefile.txt someWord ' in Terminal, Perl complains (Perl's output for the first version) 当我在Terminal中执行'./script.pl <somefile.txt someWord'时,Perl抱怨(Perl在第一个版本中的输出)

 Use of uninitialized value $hashWord in hash element at   
 ./word_counter2.pl line 35.

 Use of uninitialized value in print at ./word_counter2.pl line 35.

What Perl complains for the second version; Perl对第二个版本的抱怨是什么?

 Can't use string ("0") as a HASH ref while "strict refs" in use at ./word_counter2.pl line 13, <STDIN> line 8390.

At least now I know the script can successfully work until this very last point, and it seems something semantic rather than syntactical. 至少现在我知道脚本可以成功运行到最后一刻,而且似乎有些语义而不是语法。

Any further advice on this last part? 关于最后一部分还有其他建议吗? Would be really appreciated. 将不胜感激。

PS: Sorry pilgrims, I am just a novice in the path of Perl. PS:对不起朝圣者,我只是Perl的新手。

A quick test on the command line with this example shows one correct syntax for passing in a word and a hash reference to a function: 通过此示例在命令行上进行的快速测试显示了一种正确的语法,用于传递单词和对函数的哈希引用:

use strict;
use warnings;
use v5.18;
sub foo {
    my $word = $_[0];
    shift;
    my $hsh = $_[0];
    say $word; say $hsh->{$word};
};
foo("x", {"x" => 4});
# prints x and 4

This treats the argument list as an array, getting the first element and popping it off each time. 这会将参数列表视为数组,每次获取第一个元素并将其弹出。 Instead, I would actually suggest getting both arguments at the same time: my ($word, $hsh) = @_; 相反,我实际上建议同时获取两个参数: my ($word, $hsh) = @_;

And your syntax for accessing the hash ref elements may well be correct, but I find it easier to remember the syntax which is shared between C++ and perl: an arrow means dereferencing. 您访问散列引用元素的语法可能是正确的,但是我发现记住C ++和perl之间共享的语法更容易:箭头表示取消引用。 Plus you know you'll never accidentally copy the data structure when using the arrow syntax. 另外,您知道使用箭头语法时永远不会意外复制数据结构。

Your fixed version is not much better than your first one. 固定版本并不比第一个好。 Although it passes the syntax check it has several semantic errors. 尽管它通过了语法检查,但仍存在一些语义错误。 Here is a version with the minimum amount of fixes to make it work 这是一个具有最少数量的修复程序的版本

NOTE: this is not how you write it in idiomatic Perl. 注意:这不是您在惯用的Perl中编写它的方式。

#!/usr/bin/perl
use warnings;
use strict;

sub hash_4Frequency($$) {
    my($ref2_Hash, $hashWord) = @_;

    print $ref2_Hash -> {$hashWord}, "\n";
}

my %f = ();  # hash that will contain words and their frequencies
my $wc = 0;  # word-count

while (<STDIN>)
{
    chomp;
    while (/(\w+)/g)
    {
        $f{$1}++;
        $wc++;
    }
}

hash_4Frequency(\%f, $ARGV[0]);

Test output with "Lorem ipsum" as input text: 使用“ Lorem ipsum”作为输入文本测试输出:

$ cat dummy.txt 
Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor
incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat.
Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa
qui officia deserunt mollit anim id est laborum.

$ perl <dummy.txt dummy.pl Lorem
1

BONUS CODE: this would be my first stab the given problem. 奖励代码:这将是我首先遇到的问题。 Your first version lower-cased all words, which does makes sense, so I kept it: 您的第一个版本将所有单词都小写,这是有道理的,因此我保留了它:

#!/usr/bin/perl
use warnings;
use strict;

sub word_frequency($$) {
    my($hash_ref, $word) = @_;

    print "The word '${word}' appears ", $hash_ref->{$word} // 0, " time(s) in the input text.\n";
}

my %words;  # hash that will contain words and their frequencies
my $wc = 0; # word-count

while (<STDIN>) {
    # lower case all words
    $wc += map { $words{lc($_)}++ } /(\w+)/g
}

print "Input text has ${wc} words in total, of which ",
      scalar(keys %words),
      " are unique.\n";

# return frequency in input text for every word on the command line
foreach my $word (@ARGV) {
    word_frequency(\%words, lc($word));
}

exit 0;

Test run 测试运行

$ perl <dummy.txt dummy.pl Lorem ipsum dolor in test
Input text has 66 words in total, of which 61 are unique.
The word 'lorem' appears 1 time(s) in the input text.
The word 'ipsum' appears 1 time(s) in the input text.
The word 'dolor' appears 1 time(s) in the input text.
The word 'in' appears 2 time(s) in the input text.
The word 'test' appears 0 time(s) in the input text.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM