繁体   English   中英

尝试比较两个文件的“词性”标签并将匹配的标签打印在单独的文件中

[英]Trying to compare "parts of speech" tags of two files and print the matched tags in a separate file

我正在尝试编写一个 perl 程序来比较两个文本文件的“词性”标签,并在 Windows 的单独文件中打印匹配的标签以及相应的单词。

File1:
boy N
went V
loves V
girl N
File2:
boy N
swims V
girl N
loves V

预期输出:男孩 NN 女孩 NN 喜欢 VV

列由制表符分隔。 到目前为止我所做的编码:

use strict;
use warnings;

my $filename = 'file1.txt';
open(my $fh, $filename)
  or die "Could not open file '$filename'";

while (my $row = <$fh>) {
  chomp $row;
  print "$row\n";
}
my $tagfile = 'file2.txt';
open(my $tg, $tagfile)
  or die "Could not open file '$filename'";
while (my $row = <$tg>) {
    chomp $row;
    print "$row\n";
    } 

真的不清楚你在问什么。 但我认为这很接近。

#!/usr/bin/perl

use strict;
use warnings;

my ($file1, $file2) = @ARGV;

my %words; # Keep details of the words
while (<>) { # Read all input files a line at a time
  chomp;
  my ($word, $pos) = split;
  $words{$ARGV}{$word}{$pos}++;

  # If we're processing file1 then don't look for a match
  next if $ARGV eq $file1;

  if (exists $words{$file1}{$word}{$pos}) {
     print join(' ', $word, ($pos) x 2), "\n";
  }
}

像这样运行它:

./pos file1 file2

给出:

boy N N
girl N N
loves V V

好的,首先你想要的是一个hash

你需要:

  • 读取第一个文件,将其拆分为“word”和“pos”。
  • 将其保存在哈希中
  • 读取第二个文件,将每一行拆分为“word”和“pos”。
  • 将它与您填充的散列进行比较,并检查它是否匹配。

像这样的东西:

#!/usr/bin/env perl 
use strict;
use warnings;

#declare our hash:

my %pos_for;


#open the first file
my $filename = 'file1.txt';
open( my $fh, '<', $filename ) or die "Could not open file '$filename'";

while (<$fh>) {
    #remove linefeed from this line.
    #note - both chomp and split default to using $_ which is defined by the while loop.
    chomp;

    #split it on whitespace.
    my ( $word, $pos ) = split;

    #record this value in the hash %pos_for
    $pos_for{$word} = $pos;
}
close($fh);

#process second file:

my $tagfile = 'file2.txt';
open( my $tg, '<', $tagfile ) or die "Could not open file '$filename'";
while (<$tg>) {

    #remove linefeed from this line.
    chomp;

    #split it on whitespace.
    my ( $word, $pos ) = split;

    #check if this word was in the other file
    if (defined $pos_for{$word}
        #and that it's the same "pos" value.
        and $pos_for{$word} eq $pos
        )
    {
        print "$word $pos\n";
    }
}
close($tg);

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM