简体   繁体   中英

pack and unpack at the right format in python

I'm looking to unpack from a buffer a string and its length.

For example to obtain (4, 'Gégé') from this buffer :
b'\\x00\\x04G\\xE9g\\xe9'

Does someone know how to do ?

The length data looks like a big-endian unsigned 16 bit integer, and the string data looks like it's using the Latin1 encoding. If that's correct, you can extract it like this:

from struct import unpack

def extract(buff):
    return unpack(b'>H', buff[:2])[0], buff[2:].decode('latin1')

buff = b'\x00\x04G\xE9g\xe9'
print(extract(buff))

output

(4, 'Gégé')

Another possibility for the encoding is the old Windows code page 1252 , which can be decoded using .decode('cp1252') .


The above code works in both Python 2 & Python 3. But in Python 3 there's an easier way: we don't need struct.unpack , we can use the int.from_bytes method.

def extract(buff):
    return int.from_bytes(buff[:2], 'big'), buff[2:].decode('latin1')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM