如何使用 linux shell 脚本删除文件中的 ^[ 和所有转义序列

Question

我们要删除^[和所有转义序列。

sed 不工作，给我们这个错误：

$ sed 's/^[//g' oldfile > newfile; mv newfile oldfile;
sed: -e expression #1, char 7: unterminated `s' command

$ sed -i '' -e 's/^[//g' somefile
sed: -e expression #1, char 7: unterminated `s' command

Answer 1

您在寻找ansifilter吗？

您可以做两件事：输入文字转义（在 bash 中：）

使用键盘输入：

sed 's/Ctrl-vEsc//g'

或者

sed 's/Ctrl-vCtrl-[//g'

或者您可以使用字符转义：

sed 's/\x1b//g'

或所有控制字符：

sed 's/[\x01-\x1F\x7F]//g' # NOTE: zaps TAB character too!

Answer 2

commandlinefu 给出了去除 ANSI 颜色和移动命令的正确答案：

 sed "s,\x1B\[[0-9;]*[a-zA-Z],,g"

Answer 3

为了我的目的，我管理了以下内容，但这不包括所有可能的ANSI 转义：

sed -r s/\x1b\[[0-9;]*m?//g

这将删除m命令，但对于所有转义（如@lethalman 评论）使用：

sed -r s/\x1b\[[^@-~]*[@-~]//g

另请参阅“https://stackoverflow.com/questions/7857352/python-regex-to-match-vt100-escape-sequences”。

还有一张常见转义序列表。

Answer 4

ansi2txt 命令（kbtin 包的一部分）似乎在 Ubuntu 上完美地完成了这项工作。

Answer 5

在寻找一种从手册页中去除额外格式的方法时，我偶然发现了这篇文章。 ansifilter 做到了，但与预期的结果相去甚远（例如，所有以前的粗体字符都被复制了，例如SSYYNNOOPPSSIISS ）。

对于该任务，正确的命令是col -bx ，例如：

groff -man -Tascii fopen.3 | col -bx > fopen.3.txt

（资源）

为什么会这样：（回应@AttRigh 的评论）

groff像在打字机上一样产生粗体字符：打印一个字母，用退格键向后移动一个字符（您不能在打字机上擦除文本），再次打印相同的字母以使字符更明显。 所以简单地省略退格会产生“SSYYNNOOPPSSIISS”。 col -b通过正确解释退格来解决此问题，引用手册：

-b不要 output 任何退格，仅打印写入每列 position 的最后一个字符。

Answer 6

您可以使用以下命令删除所有不可打印的字符：

sed 's/[^[:print:]]//g'

Answer 7

我没有足够的声誉来为Luke H给出的答案添加评论，但我确实想分享我一直用来消除所有 ASCII 转义序列的正则表达式。

sed -r 's~\x01?(\x1B\(B)?\x1B\[([0-9;]*)?[JKmsu]\x02?~~g'

Answer 8

我为此构建了vtclean 。 它按顺序使用这些正则表达式去除转义序列（在regex.txt中解释）：

// handles long-form RGB codes
^\033](\d+);([^\033]+)\033\\

// excludes non-movement/color codes
^\033(\[[^a-zA-Z0-9@\?]+|[\(\)]).

// parses movement and color codes
^\033([\[\]]([\d\?]+)?(;[\d\?]+)*)?(.)`)

它还进行基本的行编辑模拟，因此可以解析退格和其他移动字符（如左箭头键）。

Answer 9

只是一个注释； 假设你有一个这样的文件（这样的行尾是由git远程报告生成的）：

echo -e "remote: * 27625a8 (HEAD, master) 1st git commit\x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: Current branch master is up to date.\x1b[K" > chartest.txt

在二进制中，这看起来像这样：

$ cat chartest.txt | hexdump -C
00000000  72 65 6d 6f 74 65 3a 20  2a 20 32 37 36 32 35 61  |remote: * 27625a|
00000010  38 20 28 48 45 41 44 2c  20 6d 61 73 74 65 72 29  |8 (HEAD, master)|
00000020  20 31 73 74 20 67 69 74  20 63 6f 6d 6d 69 74 1b  | 1st git commit.|
00000030  5b 4b 0a 72 65 6d 6f 74  65 3a 20 1b 5b 4b 0a 72  |[K.remote: .[K.r|
00000040  65 6d 6f 74 65 3a 20 1b  5b 4b 0a 72 65 6d 6f 74  |emote: .[K.remot|
00000050  65 3a 20 1b 5b 4b 0a 72  65 6d 6f 74 65 3a 20 1b  |e: .[K.remote: .|
00000060  5b 4b 0a 72 65 6d 6f 74  65 3a 20 1b 5b 4b 0a 72  |[K.remote: .[K.r|
00000070  65 6d 6f 74 65 3a 20 43  75 72 72 65 6e 74 20 62  |emote: Current b|
00000080  72 61 6e 63 68 20 6d 61  73 74 65 72 20 69 73 20  |ranch master is |
00000090  75 70 20 74 6f 20 64 61  74 65 2e 1b 5b 4b 0a     |up to date..[K.|
0000009f

可见git这里在行尾（ 0x0a ）之前添加了序列0x1b 0x5b 0x4b 。

请注意 - 虽然您可以将0x1b与 sed 中的文字格式\x1b匹配，但您不能对0x5b执行相同操作，它表示左方括号[ ：

$ cat chartest.txt | sed 's/\x1b\x5b//g' | hexdump -C
sed: -e expression #1, char 13: Invalid regular expression

您可能认为您可以使用额外的反斜杠\来转义表示形式 - 以\\x5b ； 但是虽然“通过” - 它与预期的任何内容都不匹配：

$ cat chartest.txt | sed 's/\x1b\\x5b//g' | hexdump -C
00000000  72 65 6d 6f 74 65 3a 20  2a 20 32 37 36 32 35 61  |remote: * 27625a|
00000010  38 20 28 48 45 41 44 2c  20 6d 61 73 74 65 72 29  |8 (HEAD, master)|
00000020  20 31 73 74 20 67 69 74  20 63 6f 6d 6d 69 74 1b  | 1st git commit.|
00000030  5b 4b 0a 72 65 6d 6f 74  65 3a 20 1b 5b 4b 0a 72  |[K.remote: .[K.r|
00000040  65 6d 6f 74 65 3a 20 1b  5b 4b 0a 72 65 6d 6f 74  |emote: .[K.remot|
...

因此，如果您想匹配此字符，显然您必须将其写为转义的左方括号，即\[ - 值的 rest 可以使用转义的\x表示法输入：

$ cat chartest.txt | sed 's/\x1b\[\x4b//g' | hexdump -C
00000000  72 65 6d 6f 74 65 3a 20  2a 20 32 37 36 32 35 61  |remote: * 27625a|
00000010  38 20 28 48 45 41 44 2c  20 6d 61 73 74 65 72 29  |8 (HEAD, master)|
00000020  20 31 73 74 20 67 69 74  20 63 6f 6d 6d 69 74 0a  | 1st git commit.|
00000030  72 65 6d 6f 74 65 3a 20  0a 72 65 6d 6f 74 65 3a  |remote: .remote:|
00000040  20 0a 72 65 6d 6f 74 65  3a 20 0a 72 65 6d 6f 74  | .remote: .remot|
00000050  65 3a 20 0a 72 65 6d 6f  74 65 3a 20 0a 72 65 6d  |e: .remote: .rem|
00000060  6f 74 65 3a 20 43 75 72  72 65 6e 74 20 62 72 61  |ote: Current bra|
00000070  6e 63 68 20 6d 61 73 74  65 72 20 69 73 20 75 70  |nch master is up|
00000080  20 74 6f 20 64 61 74 65  2e 0a                    | to date..|
0000008a

Answer 10

汤姆黑尔的回答留下了不需要的代码，但它是一个很好的工作基础。 添加额外的过滤清除剩余的不需要的代码：

sed -e "s,^[[[(][0-9;?]*[a-zA-Z],,g" \
    -e "s/^[[[][0-9][0-9]*[@]//" \
    -e "s/^[[=0-9]<[^>]*>//" \
    -e "s/^[[)][0-9]//" \
    -e "s/.^H//g" \
    -e "s/^M//g" \
    -e "s/^^H//" \
        file.dirty > file.clean

由于这是在 sed 的非 GNU 版本上完成的，您会看到^[ 、 ^H和^M ，我分别使用了 Ctrl-V <Esc>、Ctrl-V Ctrl-H 和 Ctrl-V Ctrl-M . ^>字面意思是克拉 (^) 和大于字符，而不是 Ctrl-<。

当时正在使用 TERM=xterm。

Answer 11

基于sed的方法，没有通过-r启用的扩展正则表达式

sed 's/\x1B\[[0-9;]*[JKmsu]//g'

Answer 12

我一直在使用 bash 片段来剥离（至少一些）ANSI colors：

shopt -s extglob
while IFS='' read -r line; do
  echo "${line//$'\x1b'\[*([0-9;])[Km]/}"
done

Answer 13

我的回答

这些奇怪的 ha:// URLs jenkins 用什么来填充我们的日志？

有效地从 Jenkins 控制台日志文件中删除所有 ANSI 转义序列（它还处理与此处无关的 Jenkins 特定 URL）。

我承认并感谢Marius Gedminas和睡衣在制定最终解决方案方面的贡献。

Answer 14

这个简单的 awk 解决方案对我有用，试试这个：

str="happy $(tput setaf 1)new$(tput sgr0) year!" #colored text
echo $str | awk '{gsub("(.\\[[0-9]+m|.\\(..\\[m)","",$0)}1' #remove ansi colors

如何使用 linux shell 脚本删除文件中的 ^[ 和所有转义序列

问题描述

14 个解决方案

解决方案1
57 2011-06-30 12:26:58

解决方案2
41 2017-04-26 07:37:33

解决方案3
20 2014-06-03 01:01:15

解决方案4
14 2015-05-01 16:16:37

解决方案5
10 2014-05-20 08:52:09

解决方案6
10 2018-11-06 10:53:12

解决方案7
9 2018-06-26 04:09:24

解决方案8
4 2017-05-04 06:53:49

解决方案9
3 2015-03-14 17:41:54

解决方案10
2 2019-02-16 00:38:18

解决方案11
2 2019-11-26 04:16:01

解决方案12
1 2019-04-26 17:34:59

解决方案13
0 2020-06-19 23:13:31

解决方案14
0 2021-11-28 11:20:19

如何使用 linux shell 脚本删除文件中的 ^[ 和所有转义序列

问题描述

14 个解决方案

解决方案1 57 2011-06-30 12:26:58

解决方案2 41 2017-04-26 07:37:33

解决方案3 20 2014-06-03 01:01:15

解决方案4 14 2015-05-01 16:16:37

解决方案5 10 2014-05-20 08:52:09

解决方案6 10 2018-11-06 10:53:12

解决方案7 9 2018-06-26 04:09:24

解决方案8 4 2017-05-04 06:53:49

解决方案9 3 2015-03-14 17:41:54

解决方案10 2 2019-02-16 00:38:18

解决方案11 2 2019-11-26 04:16:01

解决方案12 1 2019-04-26 17:34:59

解决方案13 0 2020-06-19 23:13:31

解决方案14 0 2021-11-28 11:20:19

解决方案1
57 2011-06-30 12:26:58

解决方案2
41 2017-04-26 07:37:33

解决方案3
20 2014-06-03 01:01:15

解决方案4
14 2015-05-01 16:16:37

解决方案5
10 2014-05-20 08:52:09

解决方案6
10 2018-11-06 10:53:12

解决方案7
9 2018-06-26 04:09:24

解决方案8
4 2017-05-04 06:53:49

解决方案9
3 2015-03-14 17:41:54

解决方案10
2 2019-02-16 00:38:18

解决方案11
2 2019-11-26 04:16:01

解决方案12
1 2019-04-26 17:34:59

解决方案13
0 2020-06-19 23:13:31

解决方案14
0 2021-11-28 11:20:19