简体   繁体   中英

Pack into c types and obtain the binary value back

I'm using the following code to pack an integer into an unsigned short as follows,

raw_data = 40

# Pack into little endian
data_packed = struct.pack('<H', raw_data)

Now I'm trying to unpack the result as follows. I use utf-16-le since the data is encoded as little-endian.

def get_bin_str(data):
    bin_asc = binascii.hexlify(data)
    result = bin(int(bin_asc.decode("utf-16-le"), 16))
    trimmed_res = result[2:]
    return trimmed_res

print(get_bin_str(data_packed))

Unfortunately, it throws the following error,

result = bin(int(bin_asc.decode("utf-16-le"), 16)) ValueError: invalid literal for int() with base 16: '㠲〰'

How do I properly decode the bytes in little-endian to binary data properly?

Use unpack to reverse what you packed. The data isn't UTF-encoded so there is no reason to use UTF encodings.

>>> import struct
>>> data_packed = struct.pack('<H', 40)
>>> data_packed.hex()   # the two little-endian bytes are 0x28 (40) and 0x00 (0)
2800
>>> data = struct.unpack('<H',data_packed)
>>> data
(40,)

unpack returns a tuple, so index it to get the single value

>>> data = struct.unpack('<H',data_packed)[0]
>>> data
40

To print in binary use string formatting. Either of these work work best. bin() doesn't let you specify the number of binary digits to display and the 0b needs to be removed if not desired.

>>> format(data,'016b')
'0000000000101000'
>>> f'{data:016b}'
'0000000000101000'

You have not said what you are trying to do, so let's assume your goal is to educate yourself. (If you are trying to pack data that will be passed to another program, the only reliable test is to check if the program reads your output correctly.)

Python does not have an "unsigned short" type, so the output of struct.pack() is a byte array. To see what's in it, just print it:

>>> data_packed = struct.pack('<H', 40)
>>> print(data_packed)
b'(\x00'

What's that? Well, the character ( , which is decimal 40 in the ascii table, followed by a null byte. If you had used a number that does not map to a printable ascii character, you'd see something less surprising:

>>> struct.pack("<H", 11)
b'\x0b\x00'

Where 0b is 11 in hex, of course. Wait, I specified "little-endian", so why is my number on the left? The answer is, it's not. Python prints the byte string left to right because that's how English is written, but that's irrelevant. If it helps, think of strings as growing upwards : From low memory locations to high memory. The least significant byte comes first, which makes this little-endian.

Anyway, you can also look at the bytes directly:

>>> print(data_packed[0])
40

Yup, it's still there. But what about the bits, you say? For this, use bin() on each of the bytes separately:

>>> bin(data_packed[0])
'0b101000'
>>> bin(data_packed[1])
'0b0'

The two high bits you see are worth 32 and 8. Your number was less than 256, so it fits entirely in the low byte of the short you constructed.

What's wrong with your unpacking code?

Just for fun let's see what your sequence of transformations in get_bin_str was doing.

>>> binascii.hexlify(data_packed)
b'2800'

Um, all right. Not sure why you converted to hex digits, but now you have 4 bytes, not two. ( 28 is the number 40 written in hex, the 00 is for the null byte.) In the next step, you call decode and tell it that these 4 bytes are actually UTF-16; there's just enough for two unicode characters, let's take a look:

>>> b'2800'.decode("utf-16-le")
'㠲〰'

In the next step Python finally notices that something is wrong, but by then it does not make much difference because you are pretty far away from the number 40 you started with.

To correctly read your data as a UTF-16 character, call decode directly on the byte string you packed.

>>> data_packed.decode("utf-16-le")
'('
>>> ord('(')
40

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM