简体   繁体   English

Sed 或 Perl:一个带有正则表达式指令的文件,每行一条指令,在另一个文件上执行

[英]Sed or Perl: One file with regex instructions, one instruction per line, executed on another file

I'm setting up a regex learning environment purely in bash/tmux with a pane for the file containing a regex, a pane for a text-file-for-processing, and a pane for the bash shell.我正在纯粹在 bash/tmux 中设置一个正则表达式学习环境,其中包含一个包含正则表达式的文件的窗格、一个用于处理文本文件的窗格和一个用于 bash shell 的窗格。 I'm at the start of " The Bastards Book of Ruby "-regex chapter.我正处于“ The Bastards Book of Ruby ”-regex 章节的开头。

The 'Bastard's Book' shows an example of a 'negative-lookahead' regex (perfect, lets learn), where perl is recommended over sed. 'Bastard's Book' 展示了一个'negative-lookahead'正则表达式的例子(完美,让我们学习),其中 perl 被推荐而不是 sed。 As I'm going for a CLI approach-> Bash command: $ perl -p file_with_regex.pl test.txt (This prints the lines from test.txt with the intended substitutions)因为我要使用 CLI 方法-> Bash 命令: $ perl -p file_with_regex.pl test.txt (这会打印 test.txt 中的行并带有预期的替换)

Question: How would I add a second regex (on a new line) of the regex.pl file, and have perl execute both the first and (next) this second instruction for processing the text file?问题:我将如何添加 regex.pl 文件的第二个正则表达式(在新行上),并让 perl 执行第一条(下一条)第二条指令来处理文本文件?

    # regex.pl
    s/^(?!Mr)/Ms./g
    s/Ms./Mrs./g

(Adding the second regex results in "Execution of regex.pl aborted due to compilation errors.") (添加第二个正则表达式会导致“由于编译错误,regex.pl 的执行中止。”)

The overall aim here is to progress in Ruby, while testing Regular Expressions as concisely as possible.这里的总体目标是在 Ruby 中取得进展,同时尽可能简洁地测试正则表达式。 Picking up a bare minimum of sed/perl while doing so would be a plus, as a proper dive into perl would take time from Ruby (and when it's time for the perl dive, I'll have had some time with the basics).在这样做的同时学习最少的 sed/perl 将是一个加分项,因为正确地深入了解 perl 需要从 Ruby 那里花一些时间(当需要深入了解 perl 时,我会花一些时间了解基础知识)。 The more I look at this the more it seems necessary to just do it in Ruby, if there isn't a perl switch that would enable a command-line-with-files approach.如果没有启用命令行文件方法的 perl 开关,我越看越有必要在 Ruby 中执行此操作。

The basic answer is that you need a semicolon after each line.基本答案是每行后都需要一个分号。

Paraphrased from perlrun , -p reads all lines of input, runs the commands you specified, and then prints out the value in $_ (the implicit variable you're running your substitute commands on in this script).perlrun 转述-p读取所有输入行,运行您指定的命令,然后打印出$_的值(您在此脚本中运行替代命令的隐式变量)。

So, removing the magic, -p transformed your code into:因此,去除魔法, -p将您的代码转换为:

LINE:
while (<>) {
    # regex.pl
    s/^(?!Mr)/Ms./g
    s/Ms./Mrs./g
} continue {
    print or die "-p destination: $!\n";
}

Perl requires a semicolon between statements (but a terminal semicolon at the end of a block is optional) hence the error. Perl 需要在语句之间使用分号(但块末尾的终端分号是可选的)因此出现错误。

I personally would recommend writing the whole script above into the file instead of using -p because it is far less magical, but you're welcome to do it either way.我个人建议将上面的整个脚本写入文件而不是使用-p因为它远没有那么神奇,但欢迎您以任何一种方式进行。

If you were going to write the whole script, I would recommend something more like the following:如果您要编写整个脚本,我会推荐以下内容:

use strict;
use warnings;

while ( my $line = <ARGV> ) {

    $line =~ s/^(?!Mr)/Ms./g;
    print "After first subst: $line";

    $line =~ s/Ms./Mrs./g;
    print "After second subst: $line";
}

use strict and use warnings are the boilerplate you want at the top of any perl script (to catch typos and other common mistakes) and explicitly calling the variable $line gives you a better understanding of how the script is working ( $_ is very magical for beginners and the source of many errors IMO, but great when you know what's what). use strictuse warnings是您想要在任何 perl 脚本顶部的样板文件(以捕获拼写错误和其他常见错误)并且显式调用变量$line让您更好地了解脚本的工作方式( $_非常神奇对于初学者和许多错误的来源 IMO,但是当你知道什么是什么时很棒)。

If you're wondering about <> vs. <ARGV> they are the same thing and mean "Read through all the lines of files provided as command-line arguments to this script or standard input if no files are provided"."如果您对<><ARGV>感到疑惑,它们是相同的东西,意思是“通读作为命令行参数提供给此脚本的所有文件行,或者如果没有提供文件,则通读标准输入”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM