[英]Print binary matches only with sed?
Let's first have a binary test file: 首先让我们有一个二进制测试文件:
echo -e '\x00\x01\x00\x0a\x00\x0f\x32\x7a\xb0\x00\x00\x01' > test.bin
hexdump -C test.bin
# 00000000 00 01 00 0a 00 0f 32 7a b0 00 00 01 0a |......2z.....|
# 0000000d
Now let's see if I can match the bytes' sequence 0x0f 0x32 0x7a with sed
: 现在让我们看看我是否可以将字节序列0x0f 0x32 0x7a与
sed
匹配:
sed -n 's/\(\x0f\x32\x7a\)/\1/p' test.bin | hexdump -C
# 00000000 00 0f 32 7a b0 00 00 01 0a |..2z.....|
# 00000009
That works as expected - the printed match is from the last linefeed 0x0a to the next one end. 可以正常工作-打印的匹配是从最后一个换行0x0a到
下一个换行符。 Now, I want to print the match only - first I try to filter out with .*
regex at start: 现在,我只想打印匹配项-首先,我尝试在开始时使用
.*
正则表达式进行过滤:
sed -n 's/.*\(\x0f\x32\x7a\)/\1/p' test.bin | hexdump -C
# 00000000 0f 32 7a b0 00 00 01 0a |.2z.....|
# 00000008
That works - now let's do the same, but also for the trailing part: 可行-现在让我们做同样的事情,但对于结尾部分也是如此:
sed -n 's/.*\(\x0f\x32\x7a\).*/\1/p' test.bin | hexdump -C
# 00000000 0f 32 7a b0 00 00 01 0a |.2z.....|
# 00000008
Well, that does not work - only the heading part is removed - but the trailing part keeps going, even if I also terminated my sed
match pattern with .*
??! 好了, 不工作-只朝向部分被删除-但尾部部分保持下去,即使我也终止我的
sed
匹配模式与.*
??!
What is going on here - and how can I get sed
to print out only the bytes 0x0f 0x32 0x7a on output (taking into account that hexdump
sed
will add the final linefeed 0x0a, when it prints a match)? 这是怎么回事-我怎么能
sed
来仅在输出上打印出字节0x0f 0x32 0x7a(考虑到hexdump
sed
会在打印匹配项时添加最后的换行0x0a)?
Interesting. 有趣。 Here's a simpler repro case:
这是一个更简单的复制案例:
echo -en '\xff\x80' | sed -n 's/\xff.*/!/p' | hexdump -C
The above prints 21 80
which is !\\x80
. 上面打印
21 80
就是!\\x80
。 The \\x80
can be a larger ASCII code too, but it cannot be smaller: \\x7F
has sed
doing the expected thing, printing only the !
\\x80
也可以是较大的ASCII代码,但不能较小: \\x7F
sed
已完成预期的工作,仅打印!
. 。
Also check out what this does: 还要检查这是做什么的:
echo -en '\xff\x80' | sed -n 's/\xff./!/p' | hexdump -C
It prints nothing at all. 它根本不打印任何内容。
So the question becomes, what's special about \\x80
and higher? 因此,问题就变成了,
\\x80
及更高版本有何特别之处? Well, UTF-8 of course! 好吧,当然是UTF-8 ! In UTF-8, having the first bit of a code point set indicates more bytes are coming.
在UTF-8中,设置代码点的第一位表示有更多字节要来。 And
sed
never finds them, so it never interprets the character at all. sed
永远找不到它们,因此它根本无法解释角色。
If you want to "fix" it, tell sed
to use the "good old" C locale: 如果要“修复”它,请告诉
sed
使用“旧的” C语言环境:
LC_ALL=C sed ...
Then you get your expected output: 然后,您将获得预期的输出:
echo -e '\x00\x01\x00\x0a\x00\x0f\x32\x7a\xb0\x00\x00\x01' |
LC_ALL=C sed -n 's/.*\(\x0f\x32\x7a\).*/\1/p' |
hexdump -C
00000000 0f 32 7a 0a |.2z.|
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.