简体   繁体   中英

Convert binary files into ascii in Python

I have a bunch of binary files that contain data in the following format:

i\\xffhh\\xffhh\\xffhh\\xffih\\xffhh\\xffhh\\xffhh\\xffhh\\xffhi\\xffii\\xffjj\\xffjj\\xffjj\\xffjk\\xffkk\\xffkk\\xffkl\\xffll\\xffmm\\xffmn\\xffnn\\xffon\\xffno\\xffop\\xffop\\xffpp\\xffqq\\xffrq\\xffrs\\xffst\\xfftt\\xfftt\\xffuv\\xffvu\\xffuv\\xffvv\\xffvw\\xffwx\\xffwx\\xffxy\\xffyy\\xffyz\\xffz{\\xffz{\\xff||\\xff}|\\xff~}\\xff}}\\xff~~\\xff~~\\xff~\\x7f\\xff\\x7f\\x7f\\xff\\x7f\\x7f\\xff\\x7f\\x7f\\xff\\x80\\x80\\xff\\x80\\x81\\xff\\x81\\x80\\xff\\x81\\x81\\xff\\x81\\x82\\xff\\x82\\x82\\xff\\x82\\x82\\xff\\x82\\x83\\xff\\x83\\x83\\xff\\x83\\x83\\xff\\x83\\x84\\xff\\x83\\x84\\xff\\x84\\x85\\xff\\x85\\x85\\xff\\x86\\x85\\xff\\x86\\x87\\xff\\x87\\x87\\xff\\x87\\x87\\xff\\x88\\x87\\xff\\x88\\x89\\xff\\x88\\x89\\xff\\x89\\x8a\\xff\\x89\\x8a\\xff\\x8a\\x8b\\xff\\x8b\\x8b\\xff\\x8b\\x8c\\xff\\x8d\\x8d\\xff\\x8d\\x8d\\xff\\x8e\\x8e\\xff\\x8e\\x8f\\xff\\x8f\\x8f

These are supposed to be pressure sensor readings from when a person is walking, so I'm assuming that they are numbers, but I want to convert them into ascii so I have some idea what they are. How do I convert them? What format are they currently in?

EDIT: Link to file provided here ( Link )

You can not guess the format by just opening up a binary file. You will have to get the information on the way data is stored for that particular pressure sensor readings.

Of course, when you know the format, it is easy to read the file in binary mode and then get all the meaningful data from it

FILE = open(filename,"rb")
FILE.read(numBytes)

I'm absolutely shocked and stunned and not a little bit amazed at all the waffle like "you have letters like hh which shouldn't be part of a hex number" and "they seem to start making sense right at the first \\x7f". Hasn't anybody seen any repr() output?

The following shows how it might have ended up like that, ignoring the \\xff which seems to be just noise:

>>> pressure = [120,121,122,123,124,125,126,127,128,129,130,131]
>>> import struct
>>> some_bytes = struct.pack("12B", *pressure)
>>> print repr(some_bytes)
'xyz{|}~\x7f\x80\x81\x82\x83'
>>>

So let's try working back from the file:

>>> guff = open('your_file.bin', 'rb').read()
>>> cleaned = guff.replace("\xff", "")
>>> cleaned
'ihhhhhhihhhhhhhhhhiiijjjjjjjkkkkkklllmmmnnnonnoopopppqqrqrsstttttuvvuuvvvvwwxwx
xyyyyzz{z{||}|~}}}~~~~~\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x80\x80\x80\x81\x81\x80\x81\
x81\x81\x82\x82\x82\x82\x82\x82\x83\x83\x83\x83\x83\x83\x84\x83\x84\x84\x85\x85\
x85\x86\x85\x86\x87\x87\x87\x87\x87\x88\x87\x88\x89\x88\x89\x89\x8a\x89\x8a\x8a\
x8b\x8b\x8b\x8b\x8c\x8d\x8d\x8d\x8d\x8e\x8e\x8e\x8f\x8f\x8f'
# Note that lines wrap at column 80 in a Windows "Command Prompt" window ...
>>> pressure = [ord(c) for c in cleaned]
>>> pressure
[105, 104, 104, 104, 104, 104, 104, 105, 104, 104, 104, 104, 104, 104, 104, 104,
 104, 104, 105, 105, 105, 106, 106, 106, 106, 106, 106, 106, 107, 107, 107, 107,
 107, 107, 108, 108, 108, 109, 109, 109, 110, 110, 110, 111, 110, 110, 111, 111,
 112, 111, 112, 112, 112, 113, 113, 114, 113, 114, 115, 115, 116, 116, 116, 116,
 116, 117, 118, 118, 117, 117, 118, 118, 118, 118, 119, 119, 120, 119, 120, 120,
 121, 121, 121, 121, 122, 122, 123, 122, 123, 124, 124, 125, 124, 126, 125, 125,
 125, 126, 126, 126, 126, 126, 127, 127, 127, 127, 127, 127, 127, 128, 128, 128,
 129, 129, 128, 129, 129, 129, 130, 130, 130, 130, 130, 130, 131, 131, 131, 131,
 131, 131, 132, 131, 132, 132, 133, 133, 133, 134, 133, 134, 135, 135, 135, 135,
 135, 136, 135, 136, 137, 136, 137, 137, 138, 137, 138, 138, 139, 139, 139, 139,
 140, 141, 141, 141, 141, 142, 142, 142, 143, 143, 143]
>>>

You'll still need to read the docs for the equipment to find out what is the scale factor to multiple those 0-254 values by.

You'll notice that the derived numbers change by +1, 0, or -1 each time. This fits comfortably with a hypothesis that it's only 1 byte per reading, rather than two or more bytes per reading.

Another thought: perhaps the \\xff is a start or end sentinel, and there are two values (start, stop) or (sensor-A, sensor-B) being reported each cycle.

The first part looks very strange. Typically a number like \\x8e is just a code for being a byte in hex, except in the first part you have letters like hh which shouldn't be part of a hex number.

But for the second part you can do something like:

hex_list = r"\x7f\xff\x7f\x7f\xff\x7f\x7f\xff\x7f\x7f\xff\x80\x80\xff\x80\x81\xff\x81\x80\xff\x81\x81\xff\x81\x82\xff\x82\x82\xff\x82\x82\xff\x82\x83\xff\x83\x83\xff\x83\x83\xff\x83\x84\xff\x83\x84\xff\x84\x85\xff\x85\x85\xff\x86\x85\xff\x86\x87\xff\x87\x87\xff\x87\x87\xff\x88\x87\xff\x88\x89\xff\x88\x89\xff\x89\x8a\xff\x89\x8a\xff\x8a\x8b\xff\x8b\x8b\xff\x8b\x8c\xff\x8d\x8d\xff\x8d\x8d\xff\x8e\x8e\xff\x8e\x8f\xff\x8f\x8f"
int_list =  [int(hex,16) for hex in hex_list.replace('\\', ';0').split(';') if hex != '']

Note you always get a number between 127 and 143, except for the 255 (the \\xff).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM