仅使用sed打印二进制匹配项？

Question

Let's first have a binary test file: 首先让我们有一个二进制测试文件：

echo -e '\x00\x01\x00\x0a\x00\x0f\x32\x7a\xb0\x00\x00\x01' > test.bin

hexdump -C test.bin 
# 00000000  00 01 00 0a 00 0f 32 7a  b0 00 00 01 0a           |......2z.....|
# 0000000d

Now let's see if I can match the bytes' sequence 0x0f 0x32 0x7a with sed : 现在让我们看看我是否可以将字节序列0x0f 0x32 0x7a与sed匹配：

sed -n 's/\(\x0f\x32\x7a\)/\1/p' test.bin | hexdump -C
# 00000000  00 0f 32 7a b0 00 00 01  0a                       |..2z.....|
# 00000009

That works as expected - the printed match is from the last linefeed 0x0a to the ~~next one~~ end. 可以正常工作-打印的匹配是从最后一个换行0x0a到~~下一个~~换行符。 Now, I want to print the match only - first I try to filter out with .* regex at start: 现在，我只想打印匹配项-首先，我尝试在开始时使用.*正则表达式进行过滤：

sed -n 's/.*\(\x0f\x32\x7a\)/\1/p' test.bin | hexdump -C
# 00000000  0f 32 7a b0 00 00 01 0a                           |.2z.....|
# 00000008

That works - now let's do the same, but also for the trailing part: 可行-现在让我们做同样的事情，但对于结尾部分也是如此：

sed -n 's/.*\(\x0f\x32\x7a\).*/\1/p' test.bin | hexdump -C
# 00000000  0f 32 7a b0 00 00 01 0a                           |.2z.....|
# 00000008

Well, that does not work - only the heading part is removed - but the trailing part keeps going, even if I also terminated my sed match pattern with .* ??! 好了，不工作-只朝向部分被删除-但尾部部分保持下去，即使我也终止我的sed匹配模式与.* ??！

What is going on here - and how can I get sed to print out only the bytes 0x0f 0x32 0x7a on output (taking into account that ~~hexdump~~ sed will add the final linefeed 0x0a, when it prints a match)? 这是怎么回事-我怎么能sed来仅在输出上打印出字节0x0f 0x32 0x7a（考虑到~~hexdump~~ sed会在打印匹配项时添加最后的换行0x0a）？

Answer 1

Interesting. 有趣。 Here's a simpler repro case: 这是一个更简单的复制案例：

echo -en '\xff\x80' | sed -n 's/\xff.*/!/p' | hexdump -C

The above prints 21 80 which is !\\x80 . 上面打印21 80就是!\\x80 。 The \\x80 can be a larger ASCII code too, but it cannot be smaller: \\x7F has sed doing the expected thing, printing only the ! \\x80也可以是较大的ASCII代码，但不能较小： \\x7F sed已完成预期的工作，仅打印! . 。

Also check out what this does: 还要检查这是做什么的：

echo -en '\xff\x80' | sed -n 's/\xff./!/p' | hexdump -C

It prints nothing at all. 它根本不打印任何内容。

So the question becomes, what's special about \\x80 and higher? 因此，问题就变成了， \\x80及更高版本有何特别之处？ Well, UTF-8 of course! 好吧，当然是UTF-8 ！ In UTF-8, having the first bit of a code point set indicates more bytes are coming. 在UTF-8中，设置代码点的第一位表示有更多字节要来。 And sed never finds them, so it never interprets the character at all. sed永远找不到它们，因此它根本无法解释角色。

If you want to "fix" it, tell sed to use the "good old" C locale: 如果要“修复”它，请告诉sed使用“旧的” C语言环境：

LC_ALL=C sed ...

Then you get your expected output: 然后，您将获得预期的输出：

echo -e '\x00\x01\x00\x0a\x00\x0f\x32\x7a\xb0\x00\x00\x01' |
  LC_ALL=C sed -n 's/.*\(\x0f\x32\x7a\).*/\1/p' |
  hexdump -C

00000000  0f 32 7a 0a                                       |.2z.|

仅使用sed打印二进制匹配项？

问题描述

1 个解决方案

解决方案1
4 已采纳 2015-01-27 08:36:52

仅使用sed打印二进制匹配项？

问题描述

1 个解决方案

解决方案1 4 已采纳 2015-01-27 08:36:52

解决方案1
4 已采纳 2015-01-27 08:36:52