[英]Perl regex substitute from hash
Is there an efficient way to substitute a bunch a strings using values from a Perl hash? 有没有一种有效的方法来使用Perl哈希中的值替换字符串?
For example, 例如,
$regex{foo} = "bar";
$regex{hello} = "world";
$regex{python} = "perl";
open(F, "myfile.txt");
while (<F>) {
foreach $key (keys %regex) {
s/$key/$regex{$key}/g;
}
}
close(F);
Is there a way to accomplish the above in Perl? 有没有办法在Perl中完成上述操作?
First question: are you sure that what you have is inefficient ? 第一个问题:你确定你所拥有的是低效率的吗?
Second, the most obvious next step would be to pull everything into a single regex: 其次,最明显的下一步是将所有内容整合到一个正则表达式中:
my $check = join '|', keys %regex;
And then you can do the substitution as: 然后你可以做替换为:
s/($check)/$regex{$1}/g;
This can still be "slow" with sufficient overlap of the keys where the regex engine has to recheck the same letters constantly. 由于正则表达式引擎必须不断地重新检查相同的字母,所以键仍然可以“缓慢”。 You can possibly use something like Regexp::Optimizer to eliminate the overlap.
您可以使用Regexp :: Optimizer之类的东西来消除重叠。 But the cost of optimising may be more than the cost of just doing everything, depending on how many changes (key/values in your hash) and how many lines you're modifying.
但是优化的成本可能不仅仅是执行所有操作的成本,具体取决于更改的数量(哈希中的键/值)以及您要修改的行数。 Premature optimisation-- !
过早的优化 - !
Note that, of course, your example code isn't doing anything with the text after the substitution. 请注意,当然,您的示例代码在替换后没有对文本执行任何操作。 It won't modify the file in-place, so I'm assuming you're handling that separately.
它不会就地修改文件,所以我假设您正在单独处理它。
To prove the point of eval
and also out of curiosity, I ran some tests with the OP's code vs. the $regex{$1}
approach vs. the eval
approach. 为了证明
eval
的观点以及出于好奇,我使用OP的代码与$regex{$1}
方法和eval
方法进行了一些测试。
First off, there seems to be little value in cramming every possible token in a (token|token|...)
match expression. 首先,在
(token|token|...)
匹配表达式中填充每个可能的标记似乎没什么价值。 Perl needs to check against all tokens at once -- it is debatable how much more efficient this is than simply checking every token at a time and doing the replacement with a hardcoded value. Perl需要立即检查所有令牌 - 这是有争议的,这比仅仅一次检查每个令牌并使用硬编码值进行替换要有效得多。
Secondly, doing $regex{$1}
means the hashmap key is extracted on every match. 其次,执行
$regex{$1}
意味着在每次匹配时都会提取hashmap键。
Anyway, here are some numbers (ran this on strawberry 5.12, with a 4MB file of 100K lines): 无论如何,这里有一些数字(在草莓5.12上运行,带有4MB文件的100K行):
$regex{$1}
approach takes 6 seconds (5 seconds with /go instead of /g) $regex{$1}
方法需要6秒 (使用/ go而不是/ g为5秒) tie
approach takes 10 seconds tie
方法需要10秒钟 eval
approach takes less than 1 second (faster than the OP code) eval
方法只需不到1秒 (比OP代码快) This is the eval
approach: 这是
eval
方法:
$regex{foo} = "bar";
$regex{hello} = "world";
$regex{python} = "perl";
$regex{bartender} = "barista";
$s = <<HEADER;
\$start = time;
open(F, "myfile.txt");
while (<F>) {
HEADER
foreach $key (keys %regex) {
$s .= "s/$key/$regex{$key}\/go;\n"
}
$s .= <<FOOTER;
print \$_;
}
close(F);
print STDERR "Elapsed time (eval.pl): " . (time - \$start) . "\r\n";
FOOTER
eval $s;
Define a regexp that matches any of the keys. 定义与任何键匹配的正则表达式。
$regex = join("|", map {quotemeta} keys %regex);
Replace any match of $regex
by $regex{$1}
. 将
$regex
的任何匹配替换为$regex{$1}
。
s/($regex)/$regex{$1}/go;
Omit the o
modifier if $regex
changes during the execution of the program. 如果
$regex
在程序执行期间发生更改, $regex
省略o
修饰符。
Note that if there are keys that are a prefix of another key (eg f
and foo
), whichever comes first in the joined regexp will be seen as a match (eg f|foo
matches f
but foo|f
matches foo
in foobar
). 请注意,如果存在作为另一个键的前缀的键(例如
f
和foo
),则在连接的正则表达式中首先出现的将被视为匹配(例如f|foo
匹配f
但foo|f
匹配foobar
中的foo
)。 If that can happen, you may need to sort keys %regex
according to which match you want to win. 如果发生这种情况,您可能需要根据您想要获胜的匹配对
keys %regex
进行排序。 (Thanks to ysth for pointing this out.) (感谢ysth指出这一点。)
perl -e ' \
my %replace = (foo=>bar, hello=>world, python=>perl); \
my $find = join "|", sort keys %replace; \
my $str = "foo,hello,python"; \
$str =~ s/($find)/$replace{$1}/g; \
print "$str\n\n"; \
'
Something you may want to consider is not going line-by-line of the file, but instead processing the whole file at once and use the /s
modifier on your regex for single-line mode. 你可能想要考虑的东西不是逐行处理文件,而是一次处理整个文件,并在正则表达式上使用
/s
修饰符进行单行模式。
What you have works as is, so it's not clear what your request is. 你的工作原理是什么,因此不清楚你的要求是什么。
One catch: The code you posted may have problems with double substitutions depending on the contents of %regex
and/or $_
. 一个问题:您发布的代码可能存在双重替换问题,具体取决于
%regex
和/或$_
。 For example, 例如,
my %regex = (
foo => 'bar',
bar => 'foo',
);
The solution is to move the foreach into the pattern, so to speak. 解决方案是将foreach移动到模式中,可以这么说。
my $pat =
join '|',
map quotemeta, # Convert text to regex patterns.
keys %regex;
my $re = qr/$pat/; # Precompile for efficiency.
my $qfn = 'myfile.txt'
open(my $fh, '<', $qfn) or die "open: $qfn: $!";
while (<$fh>) {
s/($re)/$regex{$1}/g;
... do something with $_ ...
}
#!/usr/bin/perl
use strict;
use Tie::File;
my %tr=( 'foo' => 'bar',
#(...)
);
my $r =join("|", map {quotemeta} keys %tr);
$r=qr|$r|;
tie my @array,"Tie::File",$ARGV[0] || die;
for (@array) {
s/($r)/$tr{$1}/g;
}
untie @array;
open my $fh,'<',$ARGV[0] || die;
local $/ = undef;
my $t=<$fh>;
close $fh;
$t=~s/($r)/$tr{$1}/g;
open $fh,'>',$ARGV[0] || die;
print $fh $t;
close $fh;
This is an old question, so I'm surprised no one has yet suggested the obvious: pre-compile each of the regexps (ie the hash keys). 这是一个老问题,所以我很惊讶没有人提出明显的建议:预编译每个正则表达式(即散列键)。
$regex{qr/foo/} = 'bar';
$regex{qr/hello/} = 'world';
$regex{qr/python/} = 'perl';
open(F, "myfile.txt");
while (<F>) {
foreach $key (keys %regex) {
s/$key/$regex{$key}/g;
}
}
close(F);
or for (IMO) greater readability: 或(IMO)更高的可读性:
%regex = (
qr/foo/ => 'bar',
qr/hello/ => 'world',
qr/python/ => 'perl',
);
If you know that there can only be one possible match per input line then skipping the remaining regexps with last
after a successful match will also help if there are a lot of keys. 如果你知道,只能每输入线,然后跳过与其余的正则表达式的一个可能的比赛
last
一个成功的比赛结束后也将帮助,如果有很多按键。 eg inside the for
loop: 例如在
for
循环中:
s/$key/$regex{$key}/g && last;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.