在 Python 中编写这个十六进制格式 function 的更好方法是什么？

Question

使用 Python，我想格式化一串十六进制字符：

每个字节之间的空格（很简单）： 2f2f -> 2f 2f
在指定的最大字节宽度处换行（不硬）： 2f 2f 2f 2f 2f 2f 2f 2f\n
每行的地址范围（可行）： 0x7f8-0x808: 2f 2f 2f 2f 2f 2f 2f 2f\n
将大范围的连续00字节替换为： ... trimmed 35 x 00 bytes [0x7 - 0x2a]... ... 正是在这一点上，我知道我在做一些糟糕的编码。 function 变得臃肿且难以跟上。 以不直观的方式堆积了太多功能。

示例 output：

0x0-0x10:   5a b6 f7 6e 7c 65 45 a0 bc 6a e5 f5 77 2b 92 48 
0x10-0x20:  47 d7 33 ea 40 15 44 ac 6b a4 50 78 6e f2 10 d4 
0x20-0x30:  9c 7c c1 f7 5a bf ec 9f b0 2b b7 29 97 ee 56 31 
0x30-0x40:  ff 23 d9 1a 0b 4e fd 65 50 92 42 eb b2 77 7a 55 
0x40-0x50:

我很确定在某些情况下地址范围不再正确（特别是在发生00替换时），function 看起来很恶心，我什至都不好意思展示它。

def pretty_print_hex(hex_str, byte_width=16, line_start=False, addr=0):
    out = ''
    condense_min = 12
    total_bytes = int(len(hex_str) / 2)
    line_width = False
    if byte_width is not False:
        line_width = byte_width * 2
    if line_start is not False:
        out += line_start
    end = addr + byte_width
    if (end > addr + total_bytes):
        end = addr + total_bytes
    out += f"{hex(addr)}-{hex(end)}:\t"
    addr += byte_width
    i = 0
    if len(hex_str) == 1:
        print('Cannot pretty print < 1 byte', hex_str)
        return
    condensing = False
    cond_start_addr = 0
    cond_end_addr = 0
    condense_cache = []
    while i < len(hex_str):
        byte = hex_str[i] + hex_str[i + 1]
        i += 2
        if byte == '00':
            condensing = True
            cond_start_addr = (addr - byte_width) + ((i + 1) % byte_width)
            condense_cache.append(byte)
        else:
            if condensing is True:
                condensed_count = len(condense_cache)
                if condensed_count >= condense_min:
                    cond_end_addr = cond_start_addr + condensed_count
                    out += f"... trimmed {condensed_count} x 00 bytes [{hex(cond_start_addr)} - {hex(cond_end_addr)}] ..."
                else:
                    for byte in condense_cache:
                        out += f"{byte} "
            condense_cache = []
            condensing = False
        if condensing is False:
            out += byte + ' '
            if (line_width is not False) and (i) % line_width == 0:
                out += '\n'
                if line_start is not False:
                    out += line_start
                    end = addr + byte_width
                    if end > addr + total_bytes:
                        end = addr + total_bytes
                if (addr - end) != 0:
                    out += f"{hex(addr)}-{hex(end)}:\t"
                    addr += byte_width
    if condensing is True:
        condensed_count = len(condense_cache)
        if condensed_count >= condense_min:
            cond_end_addr = cond_start_addr + condensed_count
            out += f"... trimmed {condensed_count} x 00 bytes [{hex(cond_start_addr)} - {hex(cond_end_addr)}] ..."
        else:
            for byte in condense_cache:
                out += f"{byte} "
    return out.rstrip()

示例输入/output：

hex_str = 'c8d8fb631cc7d072b62aaf9cd47bc270d4341e35f23b7a94acf24f33397a6cb4145b6eacfd56653d79bea10d2842023155e5b14bec3b5851a0a58cb3a523c476b126486e1392bdd2e3bcb6cbc333b23de387ae8624123009'
byte_width=16
line_start='\t'
addr=0

print(pretty_print_hex(hex_str , byte_width=16, line_start='\t', addr=0))

    0x0-0x10:   c8 d8 fb 63 1c c7 d0 72 b6 2a af 9c d4 7b c2 70 
    0x10-0x20:  d4 34 1e 35 f2 3b 7a 94 ac f2 4f 33 39 7a 6c b4 
    0x20-0x30:  14 5b 6e ac fd 56 65 3d 79 be a1 0d 28 42 02 31 
    0x30-0x40:  55 e5 b1 4b ec 3b 58 51 a0 a5 8c b3 a5 23 c4 76 
    0x40-0x50:  b1 26 48 6e 13 92 bd d2 e3 bc b6 cb c3 33 b2 3d 
    0x50-0x60:  e3 87 ae 86 24 12 30 09

当您涉及一些00替换时，情况会变得更糟，这是一个示例：

hex_str = 'c8000000000000000000000000000aaf9cd47bc270d4341e35f23b7a94acf24f33397a6cb4145b6eacfd56653d79bea10d2842023155e5b14bec3b5851a0a58cb3a523c476b126486e1392bdd2e3bcb6cbc333b23de387ae8624123009'
byte_width=16
line_start='\t'
addr=0
print(pretty_print_hex(hex_str, byte_width=16, line_start='\t', addr=0))

    0x0-0x10:   c8 ... trimmed 13 x 00 bytes [0xd - 0x1a] ...0a af 
    0x10-0x20:  9c d4 7b c2 70 d4 34 1e 35 f2 3b 7a 94 ac f2 4f 
    0x20-0x30:  33 39 7a 6c b4 14 5b 6e ac fd 56 65 3d 79 be a1 
    0x30-0x40:  0d 28 42 02 31 55 e5 b1 4b ec 3b 58 51 a0 a5 8c 
    0x40-0x50:  b3 a5 23 c4 76 b1 26 48 6e 13 92 bd d2 e3 bc b6 
    0x50-0x60:  cb c3 33 b2 3d e3 87 ae 86 24 12 30 09

让地址范围（`0x0-0x10）描绘真实范围也更有意义，包括该行上的修剪字节，但我什至无法开始考虑如何添加它。

与其修补这个看起来很糟糕的 function，我想我可能会完全寻求一种更好的方法，如果存在的话。

Answer 1

我喜欢这个 function 中的挑战，这就是我今晚能想到的。 它比您原来的要短一些，但不如 trincot 的答案短。

def hexpprint(
    hexstring: str,
    width: int = 16,
    hexsep: str = " ",
    addr: bool = False,
    addrstart: int = 0,
    linestart: str = "",
    compress: bool = False,
):
    # if address get hex address length size
    if addr:
        addrlen = len(f"{addrstart+len(hexstring):x}")
    # compression buffer just count hex 0 chars
    cbuf = 0
    for i in range(0, len(hexstring), width):
        j = i + width
        row = hexstring[i:j]
        # if using compression and compressable
        if compress and row.count("0") == len(row):
            cbuf += len(row)
            continue
        # if not compressable and has cbuf, flush it
        if cbuf:
            line = linestart
            if addr:
                beg = f"0x{addrstart+i-cbuf:0{addrlen}x}"
                end = f"0x{addrstart+i:0{addrlen}x}"
                line += f"{beg}-{end} "
            line += f"compressed {cbuf//2} NULL bytes"
            print(line)
            cbuf = 0
        # print formatted hex row
        line = linestart
        if addr:
            beg = f"0x{addrstart+i:0{addrlen}x}"
            end = f"0x{addrstart+i+len(row):0{addrlen}x}"
            line += f"{beg}-{end} "
        line += hexsep.join(row[i : i + 2] for i in range(0, width, 2))
        print(line)
    # flush cbuf if necessary
    if cbuf:
        line = linestart
        if addr:
            beg = f"0x{addrstart+i-cbuf:0{addrlen}x}"
            end = f"0x{addrstart+len(hexstring):0{addrlen}x}"
            line += f"{beg}-{end} "
        line += f"compressed {cbuf//2} NULL bytes"
        print(line)

PS：我不太喜欢代码重复来打印东西，所以我可能稍后再回来编辑。

Answer 2

我建议不要在 output 行的中间开始“修剪 00 字节”系列，但仅在适用于只有零的完整output 行时才应用此压缩。

这意味着您仍然会在包含非零的行中看到非压缩零，但在我看来，这会产生更清晰的 output 格式。 例如，如果一行仅以两个 00 字节结尾，则用较长的“修剪的 2 x 00 字节”消息替换该行的最后一部分确实无济于事。 通过只用这条消息替换完整的 00 行，并用一条消息压缩多个这样的行，output 格式看起来更清晰。

为了生成 output 格式，我将使用正则表达式的强大功能：

将一个字节块标识为 output 在一行上：要么是一行至少有一个非零的行，要么是一个零字节的范围，要么运行到输入的末尾，要么是“字节宽度”的倍数“ 争论。
在一行字节中插入空格

所有这些都可以通过一个表达式中的迭代来完成：

def pretty_print_hex(hex_str, byte_width=16, line_start='\t', addr=0):
    return "\n".join(f"{hex(start)}-{hex(last)}:{line_start}{line}" 
        for start, last, line in (
            (match.start() // 2, match.end() // 2 - 1,
                f"...trimmed {(match.end() - match.start()) // 2} x 00 bytes..." if match[1]
                else re.sub("(..)(?!$)", r"\1 ", match[0])
            )
            for match in re.finditer(
                f"(0+$|(?:(?:00){{{byte_width}}})+)|(?:..){{1,{byte_width}}}",
                hex_str
            )
        )
    )

Answer 3

如果您想使用它而不是编写它（不确定 - 如果需要，请告诉我删除），您可以使用优秀的（我与它无关）hexdump：

https://pypi.org/project/hexdump

python -m hexdump binary.dat

太酷了——我想你也可以检查一下想法的来源。

然而，它看起来并没有被维护......

在 Python 中编写这个十六进制格式 function 的更好方法是什么？

问题描述

3 个解决方案

解决方案1
1 2021-12-23 18:01:00

解决方案2
0 2021-12-23 11:16:53

解决方案3
0 2021-12-23 11:19:53

在 Python 中编写这个十六进制格式 function 的更好方法是什么？

问题描述

3 个解决方案

解决方案1 1 2021-12-23 18:01:00

解决方案2 0 2021-12-23 11:16:53

解决方案3 0 2021-12-23 11:19:53

解决方案1
1 2021-12-23 18:01:00

解决方案2
0 2021-12-23 11:16:53

解决方案3
0 2021-12-23 11:19:53