[英]macOS sed - Complex substitution command
I have a text file with a lot of lines and need to do some complex substitutions using macOS sed.我有一个包含很多行的文本文件,需要使用 macOS sed 进行一些复杂的替换。 It's a bit hard to explain my problem so I'll show you an example first:
解释我的问题有点困难,所以我先给你看一个例子:
The file:文件:
#00101:A9AA%AAB
#03901:%E+2100009+X3800
#06008:01020304
Expected output:预期 output:
#00101:0000%A00
#03901:%E+2000000+X0000
#06008:01020304
For all lines starting with "#xxx01:" (where x represents any digit), I need to replace all alphanumeric characters (AZ, 0-9) with "0", except the numbers before the ":", and any two-character sequences starting with "%" or "+".对于以“#xxx01:”开头的所有行(其中 x 代表任何数字),我需要将所有字母数字字符(AZ,0-9)替换为“0”,“:”之前的数字除外,以及任何两个-以“%”或“+”开头的字符序列。
I am aware of the basic substitution and exception commands, as well as using "^" to search for a pattern at the start of a line, but I am confused as to how to combine all those commands.我知道基本的替换和异常命令,以及使用“^”在行首搜索模式,但我对如何组合所有这些命令感到困惑。 How should I go about doing this?
我应该如何 go 这样做? Non-sed solutions are welcome if this is impossible in sed.
如果在 sed 中这是不可能的,欢迎使用非 sed 解决方案。
Create a file script.sed
containing:创建一个文件
script.sed
包含:
/^#[0-9]{3}01:/ {
:r
s/:((0|[+%]..)*)[A-Za-z1-9]/:\10/
t r
}
Call the file containing your sample input data data
.调用包含您的示例输入数据
data
的文件。 Run the command shown to get the required output:运行显示的命令以获取所需的 output:
$ sed -E -f script.sed data
#00101:0000%AA0
#03901:%E+0000000+X3000
#06008:01020304
$
The option -E
tells sed
to use extended regular expressions.选项
-E
告诉sed
使用扩展的正则表达式。 The option -f
tells it to read the program from the file script.sed
.选项
-f
告诉它从文件script.sed
中读取程序。
The pattern /^#[0-9]{3}01:/
looks for lines starting with a #
, followed by 3 digits, 01
and a colon.模式
/^#[0-9]{3}01:/
查找以#
开头、后跟 3 位数字、 01
和冒号的行。 The lines between {
and }
are executed for each matching line. {
和}
之间的行针对每个匹配行执行。
The line :r
creates a label r
that can be branched to with the b
or t
commands.行
:r
创建一个 label r
可以使用b
或t
命令分支。 The t r
branches to label r
if there has been a successful s///
command since the last t
command. t r
分支到 label r
如果自上一个t
命令以来有一个成功s///
命令。
The s/:((0|[+%]..)*)[A-Za-z1-9]/:\10/
command searches for the colon followed by any sequence of 0
s or +..
or %..
characters (where the dots match any character) and then followed by an alphanumeric character other than 0
. s/:((0|[+%]..)*)[A-Za-z1-9]/:\10/
命令搜索冒号后跟任何0
s 或+..
或%..
字符(点匹配任何字符),然后是0
以外的字母数字字符。 It replaces that with the colon, the remembered matches, and a 0
to replace the other alphanumeric character.它用冒号、记住的匹配项和
0
替换其他字母数字字符。 If you don't omit the 0
, you end up with an infinite loop.如果你不省略
0
,你最终会出现一个无限循环。
You can also use a command-line script instead of a script file, possibly with several -e
options (one per line of the script file) or with a single script option and enough semicolons.您还可以使用命令行脚本而不是脚本文件,可能带有多个
-e
选项(脚本文件的每一行一个)或单个脚本选项和足够的分号。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.