简体   繁体   English

在bash中处理二进制数据文件,查找大于某个数字的元素

[英]Processing binary data files in bash, finding elements which are greater than some number

I process different binary data. 我处理不同的二进制数据。 Mostly, these are signed 16-bit streams. 通常,这些是带符号的16位流。 With hexdump, it looks like: 使用hexdump时,它看起来像:

...
2150     -191    -262    15      -344    -883    -820    -1038   -780
-1234   -1406   -693    131     433     396     241     600     1280
...

I would like to see only those elements of a data stream, which are greater than or less than some threshold (data is binary signed 16-bit). 我只想查看数据流中那些大于或小于某个阈值的元素(数据是16位二进制符号)。 It could look like: 它可能看起来像:

cat data.pcm | $($here_some_filtering) 2100 -2100

where output must give me only elements which are greater than 2100 and less than -2100. 其中输出必须仅提供大于2100且小于-2100的元素。 Is there any simple command-line method how to do it? 有没有简单的命令行方法该怎么做?

一个班轮是这样的:

for c in `cat data.pcm`; do if [ $c -lt -2100 -o $c -gt 2100 ]; then echo $c; fi; done
$ cat pcm
2150     -191    -262    15      -344    -883    -820    -1038   -780
-1234   -1406   -693    131     433     396     241     600     1280

$ for num in $(< pcm); do ((num > 2100 || num < -2100)) && echo $num; done
2150

Well, binary ... personal suggestion: Do not use plain old shell - use a tool fit for the job. 好吧,二进制...个人建议:不要使用普通的旧外壳-使用适合该工作的工具。 Perl, Python, even a C/C++ program - it'll be mostly one-liners in those. Perl,Python,甚至是C / C ++程序-在这些程序中,大多数情况下都是一线的。

The following is an unoptimized hack to give you an idea: 以下是未优化的技巧,可助您一臂之力:

#!/bin/bash
lowerlimit=-333;
upperlimit=333;
filesize=`wc -c "$1" | cut -d' ' -f1`;

off=0;
while [ $off -lt $filesize ]; do
    shortval=$(od -An -s -N 2 -j $off "$1")
    test $shortval -gt $lowerlimit &&
    test $shortval -lt $upperlimit &&
    dd if="$1" bs=1 count=2 skip=$off 2>/dev/null
    off=$(($off + 2))
done

I'm not sure this can be made pipe-able in an easy way because of the fact that the shell uses line separators to split input blocks. 由于外壳使用行分隔符拆分输入块,因此我不确定是否可以通过简单的方式使它成为可管道传输的。

Bash can be made to deal with binary data. 可以使Bash处理二进制数据。

getbyte () {
    local IFS= LC_CTYPE=C res c
    read -r -d '' -n 1 c
    res=$?
    # the single quote in the argument of the printf 
    # yields the numeric value of $c (ASCII since LC_CTYPE=C)
    [[ -n $c ]] && c=$(printf '%d' "'$c") || c=0
    printf "$c"
    return $res
}

filter () {
    local b1 b2 val
    while b1=$(getbyte)
    do
        b2=$(getbyte)
        (( val = b2 * 256 + b1 ))
        (( val = val > 32767 ? val - 65536 : val ))
        if (( val > ${1:-0} || val < ${2:-0} ))
        then
            echo $val
        fi
    done
}

Examples (the data has an odd number of bytes intentionally to show that the function accommodates this condition): 示例(数据有意为奇数个字节,以表明该函数满足此条件):

$ data='\0\01\010\0377\0377\0100\0300\0200\0333'
$ echo -en "$data" | filter
256
-248
16639
-32576
219
$ echo -en "$data" | filter 222 -333
256
16639
-32576

Your command would then be: 您的命令将是:

filter 2100 -2100 < data.pcm

Whenever I want to extract numerical values from a binary file, I use od (octal dump). 每当我想从二进制文件中提取数值时,都使用od (八进制转储)。 It has many options for extracting characters, integers (8, 16, 32 and 64 bits) and floats (32 and 64 bits). 它具有许多用于提取字符,整数(8、16、32和64位)和浮点数(32和64位)的选项。 You can also specify an offset to the exact value that you are looking for. 您还可以指定与您要查找的确切值的偏移量。

For learning more about it, type: 要了解更多信息,请输入:

man od

Then, filtering on od output should not be complex in bash. 然后,在odod输出进行过滤不应od复杂。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM