简体   繁体   English

仅使用sed打印二进制匹配项?

[英]Print binary matches only with sed?

Let's first have a binary test file: 首先让我们有一个二进制测试文件:

echo -e '\x00\x01\x00\x0a\x00\x0f\x32\x7a\xb0\x00\x00\x01' > test.bin

hexdump -C test.bin 
# 00000000  00 01 00 0a 00 0f 32 7a  b0 00 00 01 0a           |......2z.....|
# 0000000d

Now let's see if I can match the bytes' sequence 0x0f 0x32 0x7a with sed : 现在让我们看看我是否可以将字节序列0x0f 0x32 0x7a与sed匹配:

sed -n 's/\(\x0f\x32\x7a\)/\1/p' test.bin | hexdump -C
# 00000000  00 0f 32 7a b0 00 00 01  0a                       |..2z.....|
# 00000009

That works as expected - the printed match is from the last linefeed 0x0a to the next one end. 可以正常工作-打印的匹配是从最后一个换行0x0a到下一个换行符。 Now, I want to print the match only - first I try to filter out with .* regex at start: 现在,我只想打印匹配项-首先,我尝试在开始时使用.*正则表达式进行过滤:

sed -n 's/.*\(\x0f\x32\x7a\)/\1/p' test.bin | hexdump -C
# 00000000  0f 32 7a b0 00 00 01 0a                           |.2z.....|
# 00000008

That works - now let's do the same, but also for the trailing part: 可行-现在让我们做同样的事情,但对于结尾部分也是如此:

sed -n 's/.*\(\x0f\x32\x7a\).*/\1/p' test.bin | hexdump -C
# 00000000  0f 32 7a b0 00 00 01 0a                           |.2z.....|
# 00000008

Well, that does not work - only the heading part is removed - but the trailing part keeps going, even if I also terminated my sed match pattern with .* ??! 好了, 工作-只朝向部分被删除-但尾部部分保持下去,即使我也终止我的sed匹配模式与.* ??!

What is going on here - and how can I get sed to print out only the bytes 0x0f 0x32 0x7a on output (taking into account that hexdump sed will add the final linefeed 0x0a, when it prints a match)? 这是怎么回事-我怎么能sed来仅在输出上打印出字节0x0f 0x32 0x7a(考虑到hexdump sed会在打印匹配项时添加最后的换行0x0a)?

Interesting. 有趣。 Here's a simpler repro case: 这是一个更简单的复制案例:

echo -en '\xff\x80' | sed -n 's/\xff.*/!/p' | hexdump -C

The above prints 21 80 which is !\\x80 . 上面打印21 80就是!\\x80 The \\x80 can be a larger ASCII code too, but it cannot be smaller: \\x7F has sed doing the expected thing, printing only the ! \\x80也可以是较大的ASCII代码,但不能较小: \\x7F sed已完成预期的工作,仅打印! .

Also check out what this does: 还要检查这是做什么的:

echo -en '\xff\x80' | sed -n 's/\xff./!/p' | hexdump -C

It prints nothing at all. 它根本不打印任何内容。

So the question becomes, what's special about \\x80 and higher? 因此,问题就变成了, \\x80及更高版本有何特别之处? Well, UTF-8 of course! 好吧,当然是UTF-8 In UTF-8, having the first bit of a code point set indicates more bytes are coming. 在UTF-8中,设置代码点的第一位表示有更多字节要来。 And sed never finds them, so it never interprets the character at all. sed永远找不到它们,因此它根本无法解释角色。

If you want to "fix" it, tell sed to use the "good old" C locale: 如果要“修复”它,请告诉sed使用“旧的” C语言环境:

LC_ALL=C sed ...

Then you get your expected output: 然后,您将获得预期的输出:

echo -e '\x00\x01\x00\x0a\x00\x0f\x32\x7a\xb0\x00\x00\x01' |
  LC_ALL=C sed -n 's/.*\(\x0f\x32\x7a\).*/\1/p' |
  hexdump -C

00000000  0f 32 7a 0a                                       |.2z.|

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM