简体   繁体   English

Python read()自动将十六进制转换为char?

[英]Python read() automatically converts hex to char?

I'm trying to convert a 4x4, 5.6.5.0.0, .bmp file into a list of rgb values to plug into another program that needs a specific format, and I'm getting stuck because I think the read() method in Python is converting some of the data before I can use it, even when I open it in "rb" mode. 我正在尝试将4x4、5.6.5.0.0,.bmp文件转换为rgb值列表,以插入需要特定格式的另一个程序,但由于我认为read()方法在即使我以“ rb”模式打开数据,Python也在转换一些数据后才能使用它们。

For example, when i use: 例如,当我使用时:

f = open("imgFile.bmp", "rb")
imgData=f.read()
f.close()

print imgData

I get: 我得到:

BMh\\x00\\x00\\x00\\x00\\x00\\x00\\x006\\x00\\x00\\x00(\\x00\\x00\\x00\\x04\\x00\\x00\\x00\\xfc\\xff\\xff\\xff\\x01\\x00\\x18\\x00\\x00\\x00\\x00\\x002\\x00\\x00\\x00\\x12\\x0b\\x00\\x00\\x12\\x0b\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\xcc\\xbb\\xaa\\xff\\xee\\xdd\\x00\\x00\\x00\\xff\\xff\\xff\\xdd\\xcc\\xbb\\x00\\x00\\x00\\xff\\xff\\xff\\x00\\x00\\x00\\x00\\x00\\x00\\xff\\xff\\xff\\x00\\x00\\x00\\xff\\xff\\xff\\xff\\xff\\xff\\x00\\x00\\x00\\xff\\xff\\xff3"\\x11\\x00\\x00 BMH \\ X00 \\ X00 \\ X00 \\ X00 \\ X00 \\ X00 \\ X006 \\ X00 \\ X00 \\ X00(\\ X00 \\ X00 \\ X00 \\ X04 \\ X00 \\ X00 \\ X00 \\ XFC \\ XFF \\ XFF \\ XFF \\ X01 \\ X00 \\ X18 \\ X00 \\ X00 \\ X00 \\ X00 \\ X002 \\ X00 \\ X00 \\ X00 \\ X12 \\ X0B \\ X00 \\ X00 \\ X12 \\ X0B \\ X00 \\ X00 \\ X00 \\ X00 \\ X00 \\ X00 \\ X00 \\ X00 \\ X00 \\ X00 \\ XCC \\ XBB \\的Xaa \\ XFF \\ XEE \\ XDD \\ X00 \\ X00 \\ X00 \\ XFF \\ XFF \\ XFF \\ XDD \\ XCC \\ XBB \\ X00 \\ X00 \\ X00 \\ XFF \\ XFF \\ XFF \\ X00 \\ X00 \\ X00 \\ X00 \\ X00 \\ X00 \\ XFF \\ XFF \\ XFF \\ X00 \\ X00 \\ X00 \\ XFF \\ XFF \\ XFF \\ XFF \\ XFF \\ XFF \\ X00 \\ X00 \\ X00 \\ XFF \\ XFF \\ xff3" \\ X11 \\ X00 \\ X00

Which is fine for the most part (I can grab the hex values I need after the bmp header—those values start at "\\xcc\\xbb\\xaa . . ." But it looks like some hex values are being interpreted as other characters and symbols, which at least make it harder to translate, but at worst result in ambiguity that makes it impossible to recover the original data with certainty. 大部分情况下这很好(我可以在bmp标头之后获取所需的十六进制值-这些值以“ \\ xcc \\ xbb \\ xaa ....”开头。但是看起来某些十六进制值被解释为其他字符,并且符号,这至少使翻译变得更加困难,但最坏的结果是导致模棱两可,从而无法确定地恢复原始数据。

For instance, you'll find this sequence near the end of the string: 例如,您会在字符串结尾附近找到以下序列:

\\xff3"\\x11 \\ xff3" \\ X11

which should appear as: 应显示为:

\\xff\\x33\\x22\\x11 \\ XFF \\ X33 \\ X22 \\ X11

( This table shows that '33' can be interpreted as '3', '22' as '"', and I'm certain that it should be that way—see how the data appears in the text editor below). 此表显示'33'可以解释为'3','22'可以解释为'“',我敢肯定应该那样做-请参见下面的文本编辑器中的数据显示方式)。

Now, it would be easy to translate all the symbols back into the hex format if there were no ambiguities, but there are many possibilities in more complex files. 现在,如果没有歧义,将所有符号转换回十六进制格式将很容易,但是在更复杂的文件中有很多可能性。 For instance, if I have the sequence '6666' it would just be changed into 'ff', which I would be unable to tell appart from instances of 'ff' that I might already have in my data. 例如,如果我使用序列“ 6666”,它将被更改为“ ff”,那么我将无法从数据中可能已经包含的“ ff”实例告诉appart。

My question is: how do I keep the data untranslated and unambiguous for further parsing and formatting in Python? 我的问题是:如何在Python中进行进一步的解析和格式化以保持数据的翻译和明确性?

To confirm that what I've described is happening, I've opened the file in SublimeText, where it appears as this: 为了确认我所描述的正在发生,我已经在SublimeText中打开了该文件,其显示如下:

424d 6800 0000 0000 0000 3600 0000 2800 0000 0400 0000 fcff ffff 0100 1800 0000 0000 3200 0000 120b 0000 120b 0000 0000 0000 0000 0000 ccbb aaff eedd 0000 00ff ffff ddcc bb00 0000 ffff ff00 0000 0000 00ff ffff 0000 00ff ffff ffff ff00 0000 ffff ff33 2211 0000 424d 6800 0000 0000 0000 3600 0000 2800 0000 0400 0000 fcff ffff 0100 1800 0000 0000 3200 0000 120b 0000 120b 0000 0000 0000 0000 0000 ccbb aef eedd 0000 00ff ffff ddcc bb00 0000 ffff ff00 0000 00ffff ff ff ff 00 2211 0000

, which is correct and usable (though not efficient for my purposes, to have to open in a text editor every time), so i would like to automate the process with Python. ,它是正确且可用的(尽管对我而言不是很有效,但每次都必须在文本编辑器中打开),所以我想使用Python自动执行该过程。

Incidentally, I think this may be what was happening for this person , too. 顺便说一句,我认为这也可能是这个人正在发生的事情。

Python shows you a literal string value, and uses escape codes to prevent your terminal from going haywire. Python向您显示一个文字字符串值,并使用转义码来防止您的终端陷入困境。 Anything that is not a printable ASCII character is shown as a escape code instead. 任何非可打印ASCII字符的内容都将显示为转义码。

The value itself is still fully binary . 该值本身仍然完全二进制的

>>> '\x00'
'\x00'
>>> len('\x00')
1
>>> '\x65'
'e'

In the above example, the null byte is displayed as a \\x00 escape code, but it is still only one byte (length 1). 在上面的示例中,空字节显示为\\x00转义码,但仍仅为一个字节(长度为1)。 A byte with hex value 65 is displayed as an e because it is a printable ASCII character. 十六进制值65的字节显示为e因为它是可打印的ASCII字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM