[英]Perl: comparing words in two files
This is my current script to try and compare the words in file_all.txt
to the ones in file2.txt
. 这是我当前的脚本,试图在字比较file_all.txt
在那些file2.txt
。 It should print out any of the words in file_all
that are not in file2
. 它应该打印出file_all
中不在file2
任何单词。
I need to format these as one word per line, but that's not the more pressing issue. 我需要将这些格式设置为每行一个单词,但这不是更紧迫的问题。
I am new to Perl ... I get C and Python more but this is being a bit tricky, I know my variable assignment is off. 我是Perl的新手。我更多地使用了C和Python,但这有点棘手,我知道我的变量分配已关闭。
use strict;
use warnings;
my $file2 = "file_all.txt"; %I know my assignment here is wrong
my $file1 = "file2.txt";
open my $file2, '<', 'file2' or die "Couldn't open file2: $!";
while ( my $line = <$file2> ) {
++$file2{$line};
}
open my $file1, '<', 'file1' or die "Couldn't open file1: $!";
while ( my $line = <$file1> ) {
print $line unless $file2{$line};
}
EDIT: OH, it should ignore case... like Pie is the same as PIE when comparing. 编辑:哦,它应该忽略大小写...就像Pie在比较时与PIE相同。 and remove apostrophes 并删除撇号
These are the errors I am getting: 这些是我得到的错误:
"my" variable $file2 masks earlier declaration in same scope at absent.pl line 9. "my" variable $file1 masks earlier declaration in same scope at absent.pl line 14. Global symbol "%file2" requires explicit package name at absent.pl line 11. Global symbol "%file2" requires explicit package name at absent.pl line 16. Execution of absent.pl aborted due to compilation errors.
You are almost there. 你快到了。
The %
sigil denotes a hash. 所述%
印记表示哈希。 You can't store a file name in a hash, you need a scalar for that. 您不能将文件名存储在哈希中,为此需要一个标量。
my $file2 = 'file_all.txt';
my $file1 = 'file2.txt';
You need a hash to count the occurrences. 您需要一个哈希来计算出现次数。
my %count;
To open a file, specify its name - it's stored in the scalar, do you remember? 要打开文件,请指定文件名-文件存储在标量中,还记得吗?
open my $FH, '<', $file2 or die "Can't open $file2: $!";
Then, process the file line by line: 然后,逐行处理文件:
while (my $line = <$FH> ) {
chomp; # Remove newline if present.
++$count{lc $line}; # Store the lowercased string.
}
Then, open the second file, process it line by line, use lc
again to get the lowercased string. 然后,打开第二个文件,逐行处理,再次使用lc
获取小写的字符串。
To remove apostophes, use a substitution: 要删除使徒,请使用替代:
$line =~ s/'//g; # Replace ' by nothing globally (i.e. everywhere).
Your error messages: 您的错误消息:
"my" variable $file2 masks earlier declaration in same scope at absent.pl line 9. “ my”变量$ file2在absent.pl第9行的相同作用域中掩盖了较早的声明。\n"my" variable $file1 masks earlier declaration in same scope at absent.pl line 14. “ my”变量$ file1在absent.pl第14行掩盖了同一作用域中的较早声明。\nGlobal symbol "%file2" requires explicit package name at absent.pl line 11. 全局符号“%file2”在absent.pl第11行需要显式的程序包名称。\nGlobal symbol "%file2" requires explicit package name at absent.pl line 16. 全局符号“%file2”在absent.pl第16行需要显式的程序包名称。\nExecution of absent.pl aborted due to compilation errors. 由于编译错误,absent.pl的执行中止。
You are assigning a file name to $file2
, and then later you are using open my $file2 ...
The use of my $file2
in the second case masks the use in the first case. 您正在为$file2
分配一个文件名,然后稍后使用open my $file2 ...
在第二种情况下使用$file2
掩盖第一种情况的使用。 Then, in the body of the while loop, you pretend there is a hash table %file2
, but you haven't declared it at all. 然后,在while循环的主体中,您假设有一个哈希表%file2
,但您根本没有声明它。
You should use more descriptive variable names to avoid conceptual confusion. 您应该使用更具描述性的变量名,以避免概念上的混乱。
For example: 例如:
my @filenames = qw(file_all.txt file2.txt);
Using variables with integer suffixes is a code smell . 使用带有整数后缀的变量是一种代码味道 。
Then, factor common tasks to subroutines. 然后,将常见任务分解为子例程。 In this case, what you need are: 1) A function that takes a filename and returns a table of words in that file, and 2) A function that takes a filename, and a lookup table, and prints words that are in the file, but do not appear in the lookup table. 在这种情况下,您需要:1)一个使用文件名并返回该文件中的单词表的函数,以及2)一个使用文件名,一个查找表并打印文件中的单词的函数,但不会出现在查找表中。
#!/usr/bin/env perl
use strict;
use warnings;
use Carp qw( croak );
my @filenames = qw(file_all.txt file2.txt);
print "$_\n" for @{ words_notseen(
$filenames[0],
words_from_file($filenames[1])
)};
sub words_from_file {
my $filename = shift;
my %words;
open my $fh, '<', $filename
or croak "Cannot open '$filename': $!";
while (my $line = <$fh>) {
$words{ lc $_ } = 1 for split ' ', $line;
}
close $fh
or croak "Failed to close '$filename': $!";
return \%words;
}
sub words_notseen {
my $filename = shift;
my $lookup = shift;
my %words;
open my $fh, '<', $filename
or croak "Cannot open '$filename': $!";
while (my $line = <$fh>) {
for my $word (split ' ', $line) {
unless (exists $lookup->{$word}) {
$words{ $word } = 1;
}
}
}
return [ keys %words ];
}
As you have mention in your question: It should print out any of the words in file_all
that are not in file2
正如您在问题中提到的那样: 它应该打印出file_all
中不在file2
任何单词
This below small code does this: 下面的小代码执行此操作:
#!/usr/bin/perl
use strict;
use warnings;
my ($file1, $file2) = qw(file_all.txt file2.txt);
open my $fh1, '<', $file1 or die "Can't open $file1: $!";
open my $fh2, '<', $file2 or die "Can't open $file2: $!";
while (<$fh1>)
{
last if eof($fh2);
my $compline = <$fh2>;
chomp($_, $compline);
if ($_ ne $compline)
{
print "$_\n";
}
}
file_all.txt: file_all.txt:
ab
cd
ee
ef
gh
df
file2.txt: file2.txt:
zz
yy
ee
ef
pp
df
Output: 输出:
ab
cd
gh
The issue is the following two lines: 问题是以下两行:
my %file2 = "file_all.txt";
my %file1 = "file2.txt";
Here you are assigning a single value, called a SCALAR in Perl, to a Hash (denoted by the %
sigil). 在这里,您为哈希指定了一个值(在Perl中称为SCALAR) (以%
标记表示)。 Hashes consist of key value pairs separated by the arrow operator (=>). 哈希由由箭头运算符(=>)分隔的键值对组成。 eg 例如
my %hash = ( key => 'value' );
Hashes expect an even number of arguments because they must be given both a key and a value . 哈希值期望偶数个参数,因为必须同时给它们一个键和一个值 。 You currently only give each Hash a single value, thus this error is thrown. 当前,您只为每个哈希提供单个值,因此会引发此错误。
To assign a value to a SCALAR, you use the $
sigil: 要将值分配给SCALAR,请使用$
标记:
my $file2 = "file_all.txt";
my $file1 = "file2.txt";
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.