简体   繁体   English

Python将字节转换为字符串

[英]Python converting bytes to string

I have the following code: 我有以下代码:

with open("heart.png", "rb") as f:

    byte = f.read(1)

    while byte:

        byte = f.read(1)

        strb = byte.decode("utf-8", "ignore")

        print(strb)

When reading the bytes from "heart.png" I have to read hex bytes such as: 从“heart.png”读取字节时,我必须读取十六进制字节,例如:

b'öx1a', b'öxff', b'öxa4', etc.

and also bytes in this form: 以及这种形式的字节:

b'A', b'D', b'O', b'D', b'E', etc.    <- spells ADOBE

Now for some reason when I use the above code to convert from byte to string it does not seem to work with the bytes in hex form but it works for everything else. 现在出于某种原因,当我使用上面的代码从字节转换为字符串时,它似乎不适用于十六进制形式的字节,但它适用于其他一切。

So when b'öx1a' comes along it converts it to "" (empty string) 因此,当b'öx1a'出现时,它会将其转换为"" (空字符串)

and when b'H' comes along it converts it to "H" b'H'出现时,它将其转换为"H"

does anyone know why this is the case? 有谁知道为什么会这样?

There's a few things going on here. 这里发生了一些事情。

The PNG file format can contain text chunks encoded in either Latin-1 or UTF-8. PNG文件格式可以包含以Latin-1或UTF-8编码的文本块。 The tEXt chunks are encoded in Latin-1 and you would need to decode them using the 'latin-1' codec. tEXt块以Latin-1编码,您需要使用'latin-1'编解码器对它们进行解码。 iTXt chunks are encoded in UTF-8 and would need to be decoded with the 'utf-8' codec. iTXt块以UTF-8编码,需要使用'utf-8'编解码器进行解码。

However, you appear to be trying to decode individual bytes, whereas characters in UTF-8 may span multiple bytes. 但是,您似乎尝试解码单个字节,而UTF-8中的字符可能跨越多个字节。 So assuming you want to read UTF-8 strings, what you should do is read in the entire length of the string you wish to decode before attempting to decode it. 因此,假设您要读取UTF-8字符串,您应该在尝试解码之前读取要解码的字符串的整个长度。

If instead you are trying to interpret binary data from the file, take a look at the struct module which is intended for that purpose. 如果您试图从文件中解释二进制数据,请查看用于此目的的struct模块。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM