简体   繁体   English

与 Perl 正则表达式的虚假匹配

[英]Spurious matches with Perl regex

The goal of this Perl program is to consume a spreadsheet, trim it to only the two desired columns, then do a find-and-replace of the entries from the first column with the entries from the second column.这个 Perl 程序的目标是使用一个电子表格,将其修剪为仅需要的两列,然后用第二列中的条目查找并替换第一列中的条目。 The eventual goal will be to make it so that in the event of the spreadsheet having duplicates of a given value in the first column paired with distinct values in the second column (eg the term 'foo' appears in one row paired with 'bar' and it appears in a second row paired with 'baz'), the user will be asked to adjudicate each replacement.最终的目标是做到这一点,以便在电子表格的第一列中给定值的重复项与第二列中的不同值配对时(例如,术语“foo”出现在与“bar”配对的一行中并且它出现在与“baz”配对的第二行中),将要求用户判断每个替换。 For now, however, I'm simply trying to get the blunt-force 'traverse a whole directory tree and find-and-replace' functionality working.然而,就目前而言,我只是试图让直截了当的“遍历整个目录树并查找和替换”功能正常工作。 Here is the code in its current state:这是当前状态下的代码:

use strict;
use warnings;
use diagnostics;
use Text::CSV_XS;
use File::Find;

my $spreadsheet = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });

open my $fh, "<", $ARGV[0];
my @table;
while (my $row = $spreadsheet->getline($fh)) {
   push @table, $row;
}
close $fh;

@table = sort { length $b->[0] <=> length $a->[0] } @table; 

find({ 
    preprocess => \&filter, 
    wanted => \&search_and_replace 
    }, 
    $ARGV[1]
);

sub filter {
    return grep { -d or (-f and ( /\.txt$/ or /\.rst$/ or /\.yaml$/))} @_;
}

sub search_and_replace {
        open my $target_file, "<", $_;
        while (my $target_string = <$target_file>) {
            for my $row (@table) {
                my $search = $row->[0];
                my $replace = $row->[1];
                if ($target_string =~ $search) {
                    print "Found $search!\n"
                }
            }
        }
        print "Finished checking $_\n";
        close $target_file;
}

The CSV it consumes obeys this format:它使用的 CSV 遵循以下格式:

Search String         Context         Replacement
old phrase            redacted        new phrase
some old words        redacted        some new words
word                  redacted        morpheme
word                  redacted        speak
words                 redacted        morphemes

The sort in the beginning is intended to arrange this such that keys which might be substrings of other keys come later and thus I don't run the risk of interfering with the replacement of longer strings by having already replaced some substring.开始的排序旨在安排这个,以便可能是其他键的子字符串的键稍后出现,因此我不会冒着通过已经替换了一些子字符串来干扰替换较长字符串的风险。

The $replace is obviously not doing anything currently. $replace目前显然没有做任何事情。 I had earlier thought I was close to a solution, only to realize that I do not even have the matching down right.我之前以为我已经接近解决方案了,但后来才意识到我什至没有正确匹配。

I included the print "Found $search!\n" at this stage as a sanity check;我在这个阶段加入了print "Found $search!\n"作为健全性检查; checking the STDOUT indicates that I'm getting a lot of spurious matches, and without any regard for ordering.检查STDOUT表明我得到了很多虚假匹配,并且不考虑排序。 In one file, it reports finding 8 instances of the $search key from the 1st non-header row of the spreadsheet, then 3 instances of the $search key from the 53rd row of the spreadsheet,then another 8 instances of the $search key from the 1st row.在一个文件中,它报告从电子表格的第 1 个非标题行中找到 8 个$search键实例,然后从电子表格的第 53 行找到 3 个$search键实例,然后是另外 8 个$search键实例从第一行开始。 In reality, that files contains 0 instances of the the $search key from the 1st row, and only 1 instance of the the $search key from the 53rd row.实际上,该文件包含第 1 行中的 0 个$search键实例,而第 53 行中仅包含 1 个$search键实例。

It most frequently claims to have found matches against the $search key from the 1st row which turn out to be false, so I'm wondering if it's something to do with how the loops in the subroutine are constructed.它最常声称找到了与第一行中的$search键匹配的结果,结果是错误的,所以我想知道这是否与子例程中的循环的构造方式有关。 I am a Perl novice and thus don't have a clear sense of what other information might be needed to diagnose this issue;我是 Perl 新手,因此对诊断此问题可能需要哪些其他信息并不清楚; please let me know what other information I should supply.请让我知道我应该提供哪些其他信息。 Currently I am only concerned with getting the matches to happen correctly;目前我只关心让比赛正确发生; I will worry about replacement later.我会担心以后更换。 Thank you.谢谢你。

in situations like these you should try to narrow down the problem as much as possible:在这种情况下,您应该尽量缩小问题范围:

  • take a look at @table after it has been filled: does it contains what you expect?填充后查看@table :它是否包含您的期望?

  • test your search_and_replace sub in isolation (ie by feeding it a test file and a test @table ): does it report the matches correctly?单独测试您的search_and_replace子程序(即通过向其提供测试文件和测试@table ):它是否正确报告匹配项?

Sorry it's not really an answer, but take it more as a long-term advice Cheers!抱歉,这不是一个真正的答案,而是更多地将其作为长期建议干杯!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM