简体   繁体   English

Python编码:打开/读取图像文件,解码图像,重新编码图像

[英]Python Encoding: Open/Read Image File, Decode Image, RE-Encode Image

Note: I don't know much about Encoding / Decoding, but after I ran into this problem, those words are now complete jargon to me. 注意:我对编码/解码了解不多,但是在遇到这个问题之后,这些词现在对我来说是完整的行话。

Question: I'm a little confused here. 问题:我在这里有点困惑。 I was playing around with encoding/decoding images, to store an image as a TextField in a django model, looking around Stack-Overflow I found I could decode an image from ascii (I think or binary? Whatever open('file', 'wb') uses as encoding. I'm assuming the default ascii ) to latin1 and store it in a database with no problems. 我正在尝试对图像进行编码/解码,以将图像存储为Django模型中的TextField ,环顾了Stack-Overflow,发现我可以解码来自ascii的图像(我认为是二进制还是二进制?无论open('file', 'wb')什么open('file', 'wb')用作编码。我假设默认的ascii )为latin1并将其存储在数据库中没有问题。

The problem comes from creating the image from the latin1 decoded data. 问题来自根据latin1解码的数据创建图像。 When attempting to write to a file-handle I get a UnicodeEncodeError saying ascii encoding failed. 尝试写入文件句柄时,我收到UnicodeEncodeErrorascii编码失败。

I think the problem is when opening a file as binary data ( rb ) it's not a proper ascii encoding, because it contains binary data. 我认为问题在于以二进制数据( rb )打开文件时,它不是正确的ascii编码,因为它包含二进制数据。 Then I decode the binary data to latin1 but when converting back to ascii (auto encodes when trying to write to the file), it fails, for some unknown reason. 然后,我将二进制数据解码为latin1但是当转换回ascii (尝试写入文件时自动编码)时,由于某种未知原因,它失败了。

My guess is either that when decoding to latin1 the raw binary data get converted to something else, then when trying to encode back to ascii it can't identify what was once raw binary data. 我的猜测是,当解码为latin1 ,原始二进制数据将转换为其他格式,然后当尝试编码回ascii它无法识别曾经是原始二进制数据的东西。 (although the original and decoded data have the same length). (尽管原始数据和解码后的数据具有相同的长度)。 Or the problem lies not with the decoding to latin1 but that I'm attempting to ascii encode binary data. 或者问题不在于对latin1的解码,而是我正在尝试对二进制数据进行ascii编码。 In which case how would I encode the latin1 data back to an image. 在这种情况下,我将如何将latin1数据编码回图像。

I know this is very confusing but I'm confused on it all, so I can't explain it well. 我知道这很令人困惑,但是我对此感到困惑,所以我无法很好地解释它。 If anyone can answer this question there probably a riddle master. 如果有人可以回答这个问题,那可能是个谜语大师。

some code to visualize: 一些可视化的代码:

>>> image_handle = open('test_image.jpg', 'rb')
>>> 
>>> raw_image_data = image_handle.read()
>>> latin_image_data = raw_image_data.decode('latin1')
>>> 
>>> 
>>> # The raw data can't be processed by django 
... # but in `latin1` it works
>>> 
>>> # Analysis of the data
>>> 
>>> type(raw_image_data), len(raw_image_data)
(<type 'str'>, 2383864)
>>> 
>>> type(latin_image_data), len(latin_image_data)
(<type 'unicode'>, 2383864)
>>> 
>>> len(raw_image_data) == len(latin_image_data)
True
>>> 
>>> 
>>> # How to write back to as a file?
>>> 
>>> copy_image_handle = open('new_test_image.jpg', 'wb')
>>> 
>>> copy_image_handle.write(raw_image_data)
>>> copy_image_handle.close()
>>> 
>>> 
>>> copy_image_handle = open('new_test_image.jpg', 'wb')
>>> 
>>> copy_image_handle.write(latin_image_data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
>>> 
>>> 
>>> latin_image_data.encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
>>> 
>>> 
>>> latin_image_data.decode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

Unlike normal/pain text files an image file does not have any encoding, the data shown is a visual representation of the binary equivalent of the image. 与普通/疼痛文本文件不同,图像文件没有任何编码,显示的数据是图像的二进制等效物的可视表示。 Like @cameron-f says above in the question comments, this is basically gibberish and any encoding done will break the image file so don't try it. 就像@ cameron-f在问题注释中上面说的那样,这基本上是乱码,完成的任何编码都会破坏图像文件,因此请不要尝试。

But that doesn't mean all hope is lost. 但是,这并不意味着所有希望都将丢失。 Here's a way I usually turn an image to a string and back to an image. 这是我通常将图像转换为字符串然后返回图像的一种方式。

from base64 import b64decode, b64encode

image_handle = open('test_image.jpg', 'rb')

raw_image_data = image_handle.read()

encoded_data = b64encode(raw_image_data)
compressed_data = zlib.compress(encoded_image, 9) 

uncompressed_data = zlib.decompress(compressed_data)
decoded_data = b64decode(uncompressed_data)

new_image_handle = open('new_test_image.jpg', 'wb')

new_image_handle.write(decoded_data)
new_image_handle.close()
image_handle.close()


# Data Types && Data Size Analysis
type(raw_image_data), len(raw_image_data)
>>> (<type 'str'>, 2383864)

type(encoded_image), len(encoded_image)
>>> (<type 'str'>, 3178488)

type(compressed_data), len(compressed_data)
>>> (<type 'str'>, 2189311)

type(uncompressed_data), len(uncompressed_data)
>>> (<type 'str'>, 3178488)

type(decode_data), len(decode_data)
>>> (<type 'str'>, 2383864)



# Showing that the conversions were successful
decode_data == raw_image_data
>>> True

encoded_data == uncompressed_data
>>> True

The UnicodeEncodeError is popping up because a jpeg is a binary file and ASCII encoding is for plain text in plain text files. 弹出UnicodeEncodeError的原因是jpeg是二进制文件,而ASCII编码则是纯文本文件中的纯文本。

Plain text files can be created with generic text editors like notepad for Windows or nano for Linux. 可以使用通用文本编辑器(例如Windows的记事本或Linux的nano)创建纯文本文件。 Most will either use ASCII or Unicode encoding. 大多数将使用ASCII或Unicode编码。 When a text editor is reading an ASCII file it will grab a byte, say 01100001 (97 in dec), and find the corresponding glyph, 'a'. 当文本编辑器读取ASCII文件时,它将抓取一个字节,例如01100001(十进制为97),并找到相应的字形“ a”。

So when a text editor tries to read a jpg it will grab the same byte 01100001 and get 'a', but since the file holds information for displaying a photo the text will just be jibberish. 因此,当文本编辑器尝试读取jpg时,它将获取相同的字节01100001并得到'a',但是由于该文件包含用于显示照片的信息,因此文本只会变得乱七八糟。 Try opening the jpeg in notepad or nano. 尝试在记事本或Nano中打开jpeg。

As for encoding here is an explanation: What is the difference between encode/decode? 至于编码,这里有一个解释: 编码/解码之间有什么区别?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM