简体   繁体   中英

read \xHH escapes from file as raw binary in Python

I have following problem:

I want to read from file into a raw binary string :

The file looks like this (with escape characters, not binary data):

\\xfc\\xe8\\x82\\x00\\x00\\x00\\x60\\x89\\xe5\\x31\\xc0\\x64\\x8b\\x50\\x30\\x8b\\x52

code used:

data = open("filename", "rb").read()

result obtained:

b"\\\\xfc\\\\xe8\\\\x82\\\\x00\\\\x00\\\\x00\\\\x60\\\\x89\\\\xe5\\\\x31\\\\xc0\\\\x64\\\\x8b\\\\x50\\\\x30\\\\x8b\\\\x52"

With dobule \\ .

How can I read it as binary string like : \\xaa characters ? (Without escape characters)

This output is OK .

Python is outputting this data with double backslashes to show that it is non-printable . However, it's stored correctly, as bytes.

Ok. Your problem here is that you're asking the wrong question. Your data file isn't a raw binary string, it's an encoded one, encoded with escape characters. You're reading it as a raw binary, though, when you need instead to decode the escapes. Try

data = open("filename", "r", encoding='unicode_escape').read().encode('raw_unicode_escape')

instead.

Edit: ok, this now works. You need to encode into raw_unicode_escape, not utf-8 (the default).

To convert 4 ascii characters ( \\ x f c ) from file into a single byte ( 252==0xfc ), you could read ascii characters as bytes ( data = open("filename", "rb").read() ), remove \\x prefix and convert the resulting hexadecimal bytestring into bytes containing corresponding raw binary data:

>>> import binascii
>>> data = b'\\xfc\\xe8\\x82'
>>> binascii.unhexlify(data.replace(b'\\x', b''))
b'\xfc\xe8\x82'

It is best to avoid storing data as b'\\\\xfc' (4 bytes) instead of b'\\xfc' (1 byte) in the first place.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM