简体   繁体   中英

Python - issue with reading in hex byte

I am reading in a file containing hex bytes I believe, here is what i wrote:

def ByteToHexToDec( byteStr ):
    hex_list = [ "%02X" % ord(x) if "\\x" in r"%r" % x else x for x in byteStr];
    return hex_list;

with open("file.z","rb") as lines:
    for line in lines:
        print ByteToHexToDec(line);

and here is what it returned:

['04', '80', 'e', '06', 'C0', 'l', '06', 'F0', ',', '02', '00', 'w', '06', 'F0', 'r', '06', 'C0', 'd', '02', '10', '\n']

I am pretty sure this says hello world (or something very similar), and I know the hex of 'hello world' is this:

480065006C006C006F00200077006F0072006C00640021

If you see closely, the '48' matches with the first two element in the hex_list except the zeros being in the way, and the letter e has hex value 65...

So am is there some error with the bytes in the file? or am I reading in the bytes wrongly?

Thanks

The file can be downloaded here: https://drive.google.com/file/d/0B84_Z1V4nj9SS0x4MlR0a2poMkE/view?usp=sharing

The content of the file in hex:

$ od -t x1z -w16 file.z 
0000000 04 80 65 06 c0 6c 06 f0 2c 02 00 77 06 f0 72 06  >..e..l..,..w..r.<
0000020 c0 64 02 10 0a                                   >.d...<
0000025

What are you attempting?

$ echo 'hello world' | od -t x1z -w12 
0000000 68 65 6c 6c 6f 20 77 6f 72 6c 64 0a  >hello world.<
0000014

Note hexlify/unhexlify in the binascii module https://docs.python.org/2/library/binascii.html

Every other byte in your file is padded out with a starting and trailing 0 :

  • H is 04 80 instead of 48
  • e is 65 (correct)
  • l is 06 c0 instead of 6c
  • l is 6c (correct)
  • o is 06 f0 instead of 6f
  • , is 2c (correct)
  • (space) is 02 00 instead of 20

etc.

If you wanted that to turn back into Hello, world , you'll have to repair that breakage:

def repairbroken(bytestr):
    bytestr = iter(bytestr)
    for byte1, byte2, byte3 in zip(*([bytestr] * 3)):
        # character 1 is bits 4-7 in the first byte and bits 0-3 in the second
        char1 = chr((ord(byte1) & 0xff) << 4 | (ord(byte2) & 0xff) >> 4)
        yield char1
        yield byte3

Demo:

>>> binary = '\x04\x80e\x06\xc0l\x06\xf0,\x02\x00w\x06\xf0r\x06\xc0d\x02\x10\n'
>>> print ''.join(repairbroken(binary))
Hello, world!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM