Perl：比較兩個文件中的單詞

Question

這是我當前的腳本，試圖在字比較file_all.txt在那些file2.txt 。 它應該打印出file_all中不在file2任何單詞。

我需要將這些格式設置為每行一個單詞，但這不是更緊迫的問題。

我是Perl的新手。我更多地使用了C和Python，但這有點棘手，我知道我的變量分配已關閉。

 use strict;
 use warnings;

 my $file2 = "file_all.txt";   %I know my assignment here is wrong
 my $file1 = "file2.txt";

 open my $file2, '<', 'file2' or die "Couldn't open file2: $!";
 while ( my $line = <$file2> ) {
     ++$file2{$line};
     }

 open my $file1, '<', 'file1' or die "Couldn't open file1: $!";
 while ( my $line = <$file1> ) {
     print $line unless $file2{$line};
     }

編輯：哦，它應該忽略大小寫...就像Pie在比較時與PIE相同。 並刪除撇號

這些是我得到的錯誤：

"my" variable $file2 masks earlier declaration in same scope at absent.pl line 9.
"my" variable $file1 masks earlier declaration in same scope at absent.pl line 14.
Global symbol "%file2" requires explicit package name at absent.pl line 11.
Global symbol "%file2" requires explicit package name at absent.pl line 16.
Execution of absent.pl aborted due to compilation errors.

Answer 1

你快到了。

所述%印記表示哈希。 您不能將文件名存儲在哈希中，為此需要一個標量。

my $file2 = 'file_all.txt';
my $file1 = 'file2.txt';

您需要一個哈希來計算出現次數。

my %count;

要打開文件，請指定文件名-文件存儲在標量中，還記得嗎？

open my $FH, '<', $file2 or die "Can't open $file2: $!";

然后，逐行處理文件：

while (my $line = <$FH> ) {
    chomp;                # Remove newline if present.
    ++$count{lc $line};   # Store the lowercased string.
}

然后，打開第二個文件，逐行處理，再次使用lc獲取小寫的字符串。

要刪除使徒，請使用替代：

$line =~ s/'//g;  # Replace ' by nothing globally (i.e. everywhere).

Answer 2

您的錯誤消息：

  “ my”變量$ file2在absent.pl第9行的相同作用域中掩蓋了較早的聲明。\n “ my”變量$ file1在absent.pl第14行掩蓋了同一作用域中的較早聲明。\n 全局符號“％file2”在absent.pl第11行需要顯式的程序包名稱。\n 全局符號“％file2”在absent.pl第16行需要顯式的程序包名稱。\n 由於編譯錯誤，absent.pl的執行中止。

您正在為$file2分配一個文件名，然后稍后使用open my $file2 ...在第二種情況下使用$file2掩蓋第一種情況的使用。 然后，在while循環的主體中，您假設有一個哈希表%file2 ，但您根本沒有聲明它。

您應該使用更具描述性的變量名，以避免概念上的混亂。

例如：

 my @filenames = qw(file_all.txt file2.txt);

使用帶有整數后綴的變量是一種代碼味道。

然后，將常見任務分解為子例程。 在這種情況下，您需要：1）一個使用文件名並返回該文件中的單詞表的函數，以及2）一個使用文件名，一個查找表並打印文件中的單詞的函數，但不會出現在查找表中。

#!/usr/bin/env perl

use strict;
use warnings;

use Carp qw( croak );

my @filenames = qw(file_all.txt file2.txt);

print "$_\n" for @{ words_notseen(
    $filenames[0],
    words_from_file($filenames[1])
)};

sub words_from_file {
    my $filename = shift;
    my %words;

    open my $fh, '<', $filename
        or croak "Cannot open '$filename': $!";

    while (my $line = <$fh>) {
        $words{ lc $_ } = 1 for split ' ', $line;
    }

    close $fh
        or croak "Failed to close '$filename': $!";

    return \%words;
}

sub words_notseen {
    my $filename = shift;
    my $lookup = shift;

    my %words;

    open my $fh, '<', $filename
        or croak "Cannot open '$filename': $!";

    while (my $line = <$fh>) {
        for my $word (split ' ', $line) {
            unless (exists $lookup->{$word}) {
                $words{ $word } = 1;
            }
        }
    }

    return [ keys %words ];
}

Answer 3

正如您在問題中提到的那樣： 它應該打印出file_all中不在file2任何單詞

下面的小代碼執行此操作：

#!/usr/bin/perl
use strict;
use warnings;

my ($file1, $file2) = qw(file_all.txt file2.txt);

open my $fh1, '<', $file1 or die "Can't open $file1: $!";
open my $fh2, '<', $file2 or die "Can't open $file2: $!";

while (<$fh1>)
{
    last if eof($fh2);
    my $compline = <$fh2>;
    chomp($_, $compline);
    if ($_ ne $compline)
    {
        print "$_\n";
    }
}

file_all.txt：

ab
cd
ee
ef
gh
df

file2.txt：

zz
yy
ee
ef
pp
df

輸出：

ab
cd
gh

Answer 4

問題是以下兩行：

 my %file2 = "file_all.txt";
 my %file1 = "file2.txt";

在這里，您為哈希指定了一個值（在Perl中稱為SCALAR）（以%標記表示）。 哈希由由箭頭運算符（=>）分隔的鍵值對組成。 例如

my %hash = ( key => 'value' );

哈希值期望偶數個參數，因為必須同時給它們一個鍵和一個值。 當前，您只為每個哈希提供單個值，因此會引發此錯誤。

要將值分配給SCALAR，請使用$標記：

 my $file2 = "file_all.txt";
 my $file1 = "file2.txt";

Perl：比較兩個文件中的單詞

問題描述

4 個解決方案

解決方案1
1 2015-04-28 19:27:22

解決方案2
1 已采納 2015-04-28 19:38:13

解決方案3
1 2015-04-28 20:04:52

解決方案4
0 2015-04-28 19:30:31

Perl：比較兩個文件中的單詞

問題描述

4 個解決方案

解決方案1 1 2015-04-28 19:27:22

解決方案2 1 已采納 2015-04-28 19:38:13

解決方案3 1 2015-04-28 20:04:52

解決方案4 0 2015-04-28 19:30:31

解決方案1
1 2015-04-28 19:27:22

解決方案2
1 已采納 2015-04-28 19:38:13

解決方案3
1 2015-04-28 20:04:52

解決方案4
0 2015-04-28 19:30:31