简体   繁体   English

打包成c类型并取回二进制值

[英]Pack into c types and obtain the binary value back

I'm using the following code to pack an integer into an unsigned short as follows,我正在使用以下代码将整数打包成无符号短整数,如下所示,

raw_data = 40

# Pack into little endian
data_packed = struct.pack('<H', raw_data)

Now I'm trying to unpack the result as follows.现在我正在尝试按如下方式解压缩结果。 I use utf-16-le since the data is encoded as little-endian.我使用utf-16-le ,因为数据被编码为 little-endian。

def get_bin_str(data):
    bin_asc = binascii.hexlify(data)
    result = bin(int(bin_asc.decode("utf-16-le"), 16))
    trimmed_res = result[2:]
    return trimmed_res

print(get_bin_str(data_packed))

Unfortunately, it throws the following error,不幸的是,它会引发以下错误,

result = bin(int(bin_asc.decode("utf-16-le"), 16)) ValueError: invalid literal for int() with base 16: '㠲〰'结果 = bin(int(bin_asc.decode("utf-16-le"), 16)) ValueError: int() 基数为 16 的无效文字:'㠲〰'

How do I properly decode the bytes in little-endian to binary data properly?如何正确将 little-endian 中的字节正确解码为二进制数据?

Use unpack to reverse what you packed.使用 unpack 反转您打包的内容。 The data isn't UTF-encoded so there is no reason to use UTF encodings.数据不是 UTF 编码的,因此没有理由使用 UTF 编码。

>>> import struct
>>> data_packed = struct.pack('<H', 40)
>>> data_packed.hex()   # the two little-endian bytes are 0x28 (40) and 0x00 (0)
2800
>>> data = struct.unpack('<H',data_packed)
>>> data
(40,)

unpack returns a tuple, so index it to get the single value unpack返回一个元组,因此对其进行索引以获取单个值

>>> data = struct.unpack('<H',data_packed)[0]
>>> data
40

To print in binary use string formatting.要以二进制形式打印,请使用字符串格式。 Either of these work work best.这些工作中的任何一个效果最好。 bin() doesn't let you specify the number of binary digits to display and the 0b needs to be removed if not desired. bin()不允许您指定要显示的二进制位数,如果不需要,需要删除0b

>>> format(data,'016b')
'0000000000101000'
>>> f'{data:016b}'
'0000000000101000'

You have not said what you are trying to do, so let's assume your goal is to educate yourself.你还没有说你想要做什么,所以让我们假设你的目标是教育自己。 (If you are trying to pack data that will be passed to another program, the only reliable test is to check if the program reads your output correctly.) (如果您尝试打包将传递给另一个程序的数据,唯一可靠的测试是检查程序是否正确读取您的输出。)

Python does not have an "unsigned short" type, so the output of struct.pack() is a byte array. Python 没有“无符号短”类型,因此struct.pack()的输出是一个字节数组。 To see what's in it, just print it:要查看其中的内容,只需打印它:

>>> data_packed = struct.pack('<H', 40)
>>> print(data_packed)
b'(\x00'

What's that?那是什么? Well, the character ( , which is decimal 40 in the ascii table, followed by a null byte. If you had used a number that does not map to a printable ascii character, you'd see something less surprising:好吧,字符( ,在 ascii 表中是十进制的 40,后跟一个空字节。如果您使用的数字不映射到可打印的 ascii 字符,您会看到一些不那么令人惊讶的东西:

>>> struct.pack("<H", 11)
b'\x0b\x00'

Where 0b is 11 in hex, of course.当然, 0b是十六进制的 11。 Wait, I specified "little-endian", so why is my number on the left?等等,我指定了“little-endian”,为什么我的号码在左边? The answer is, it's not.答案是,不是。 Python prints the byte string left to right because that's how English is written, but that's irrelevant. Python 从左到右打印字节字符串,因为这就是英语的书写方式,但这无关紧要。 If it helps, think of strings as growing upwards : From low memory locations to high memory.如果有帮助,请将字符串视为向上增长:从低内存位置到高内存。 The least significant byte comes first, which makes this little-endian.最低有效字节首先出现,这使得这个小端。

Anyway, you can also look at the bytes directly:反正也可以直接看字节:

>>> print(data_packed[0])
40

Yup, it's still there.是的,它还在那里。 But what about the bits, you say?但是你说那些位呢? For this, use bin() on each of the bytes separately:为此,请分别对每个字节使用bin()

>>> bin(data_packed[0])
'0b101000'
>>> bin(data_packed[1])
'0b0'

The two high bits you see are worth 32 and 8. Your number was less than 256, so it fits entirely in the low byte of the short you constructed.您看到的两个高位值 32 和 8。您的数字小于 256,因此它完全适合您构造的短字节的低字节。

What's wrong with your unpacking code?你的解压代码有什么问题?

Just for fun let's see what your sequence of transformations in get_bin_str was doing.只是为了好玩,让我们看看你在get_bin_str中的转换序列在做什么。

>>> binascii.hexlify(data_packed)
b'2800'

Um, all right.嗯,好吧。 Not sure why you converted to hex digits, but now you have 4 bytes, not two.不知道为什么你转换成十六进制数字,但现在你有 4 个字节,而不是两个。 ( 28 is the number 40 written in hex, the 00 is for the null byte.) In the next step, you call decode and tell it that these 4 bytes are actually UTF-16; 28是用十六进制写的数字00表示空字节。)下一步,您调用decode并告诉它这 4 个字节实际上是 UTF-16; there's just enough for two unicode characters, let's take a look:两个unicode字符就够了,我们来看看:

>>> b'2800'.decode("utf-16-le")
'㠲〰'

In the next step Python finally notices that something is wrong, but by then it does not make much difference because you are pretty far away from the number 40 you started with.在下一步中,Python 终于注意到出了点问题,但到那时它并没有太大的区别,因为你离开始的数字 40 还很远。

To correctly read your data as a UTF-16 character, call decode directly on the byte string you packed.要将数据正确读取为 UTF-16 字符,请直接在打包的字节字符串上调用decode

>>> data_packed.decode("utf-16-le")
'('
>>> ord('(')
40

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM