简体   繁体   English

macOS sed - 复杂的替换命令

[英]macOS sed - Complex substitution command

I have a text file with a lot of lines and need to do some complex substitutions using macOS sed.我有一个包含很多行的文本文件,需要使用 macOS sed 进行一些复杂的替换。 It's a bit hard to explain my problem so I'll show you an example first:解释我的问题有点困难,所以我先给你看一个例子:

The file:文件:

#00101:A9AA%AAB
#03901:%E+2100009+X3800
#06008:01020304

Expected output:预期 output:

#00101:0000%A00
#03901:%E+2000000+X0000
#06008:01020304

For all lines starting with "#xxx01:" (where x represents any digit), I need to replace all alphanumeric characters (AZ, 0-9) with "0", except the numbers before the ":", and any two-character sequences starting with "%" or "+".对于以“#xxx01:”开头的所有行(其中 x 代表任何数字),我需要将所有字母数字字符(AZ,0-9)替换为“0”,“:”之前的数字除外,以及任何两个-以“%”或“+”开头的字符序列。

I am aware of the basic substitution and exception commands, as well as using "^" to search for a pattern at the start of a line, but I am confused as to how to combine all those commands.我知道基本的替换和异常命令,以及使用“^”在行首搜索模式,但我对如何组合所有这些命令感到困惑。 How should I go about doing this?我应该如何 go 这样做? Non-sed solutions are welcome if this is impossible in sed.如果在 sed 中这是不可能的,欢迎使用非 sed 解决方案。

Create a file script.sed containing:创建一个文件script.sed包含:

/^#[0-9]{3}01:/ {
    :r
    s/:((0|[+%]..)*)[A-Za-z1-9]/:\10/
    t r
}

Call the file containing your sample input data data .调用包含您的示例输入数据data的文件。 Run the command shown to get the required output:运行显示的命令以获取所需的 output:

$ sed -E -f script.sed data
#00101:0000%AA0
#03901:%E+0000000+X3000
#06008:01020304
$

The option -E tells sed to use extended regular expressions.选项-E告诉sed使用扩展的正则表达式。 The option -f tells it to read the program from the file script.sed .选项-f告诉它从文件script.sed中读取程序。

The pattern /^#[0-9]{3}01:/ looks for lines starting with a # , followed by 3 digits, 01 and a colon.模式/^#[0-9]{3}01:/查找以#开头、后跟 3 位数字、 01和冒号的行。 The lines between { and } are executed for each matching line. {}之间的行针对每个匹配行执行。

The line :r creates a label r that can be branched to with the b or t commands.:r创建一个 label r可以使用bt命令分支。 The t r branches to label r if there has been a successful s/// command since the last t command. t r分支到 label r如果自上一个t命令以来有一个成功s///命令。

The s/:((0|[+%]..)*)[A-Za-z1-9]/:\10/ command searches for the colon followed by any sequence of 0 s or +.. or %.. characters (where the dots match any character) and then followed by an alphanumeric character other than 0 . s/:((0|[+%]..)*)[A-Za-z1-9]/:\10/命令搜索冒号后跟任何0 s 或+..%..字符(点匹配任何字符),然后是0以外的字母数字字符。 It replaces that with the colon, the remembered matches, and a 0 to replace the other alphanumeric character.它用冒号、记住的匹配项和0替换其他字母数字字符。 If you don't omit the 0 , you end up with an infinite loop.如果你不省略0 ,你最终会出现一个无限循环。

You can also use a command-line script instead of a script file, possibly with several -e options (one per line of the script file) or with a single script option and enough semicolons.您还可以使用命令行脚本而不是脚本文件,可能带有多个-e选项(脚本文件的每一行一个)或单个脚本选项和足够的分号。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM