简体   繁体   English

将 Numpy 数组图像编码为图像类型(.png 等)以将其与 GCloud Vision API 一起使用 - 无需 OpenCV

[英]Encoding a Numpy Array Image to an Image type (.png etc.) to use it with the GCloud Vision API - without OpenCV

After deciding not to use OpenCV because I only use one function of it I was looking to replace the cv2.imencode() function with something else.在决定不使用OpenCV因为我只使用它的一个函数后,我想用其他函数替换cv2.imencode()函数。 The goal is to convert a 2D Numpy Array into a image format (like .png) to send it to the GCloud Vision API .目标是将2D Numpy Array转换为图像格式(如 .png)以将其发送到GCloud Vision API

This is what I was using until now :这是我到现在为止一直在使用的

content = cv2.imencode('.png', image)[1].tostring()
image = vision.types.Image(content=content)

And now I am looking to achieve the same without using OpenCV.现在我希望在不使用OpenCV 的情况下实现相同的目标

Things I've found so far:到目前为止我发现的东西:

  • Vision API needs base64 encoded data Vision API 需要base64编码的数据
  • Imencode returns the encoded bytes for the specific image type Imencode 返回特定图像类型的编码字节

I think it is worth noting that my numpy array is a binary image with only 2 dimensions and the whole functions will be used in an API, so saving a png to disk and reloading it is to be avoided.我认为值得注意的是,我的 numpy 数组是一个只有 2 维的二进制图像,并且整个函数将在 API 中使用,因此要避免将 png 保存到磁盘并重新加载它。

PNG writer in pure Python纯 Python 的 PNG 编写器

If you're insistent on using more or less pure python, the following function from ideasman's answer to this question is useful.如果您坚持使用或多或少的纯python,那么ideaman对这个问题的回答中的以下函数很有用。

def write_png(buf, width, height):
    """ buf: must be bytes or a bytearray in Python3.x,
        a regular string in Python2.x.
    """
    import zlib, struct

    # reverse the vertical line order and add null bytes at the start
    width_byte_4 = width * 4
    raw_data = b''.join(
        b'\x00' + buf[span:span + width_byte_4]
        for span in range((height - 1) * width_byte_4, -1, - width_byte_4)
    )

    def png_pack(png_tag, data):
        chunk_head = png_tag + data
        return (struct.pack("!I", len(data)) +
                chunk_head +
                struct.pack("!I", 0xFFFFFFFF & zlib.crc32(chunk_head)))

    return b''.join([
        b'\x89PNG\r\n\x1a\n',
        png_pack(b'IHDR', struct.pack("!2I5B", width, height, 8, 6, 0, 0, 0)),
        png_pack(b'IDAT', zlib.compress(raw_data, 9)),
        png_pack(b'IEND', b'')])

Write Numpy array to PNG formatted byte literal, encode as base64将 Numpy 数组写入 PNG 格式的字节文字,编码为 base64

To represent the grayscale image as an RGBA image, we will stack the matrix into 4 channels and set the alpha channel.为了将灰度图像表示为 RGBA 图像,我们将矩阵堆叠成 4 个通道并设置 alpha 通道。 (Supposing your 2d numpy array is called "img"). (假设您的 2d numpy 数组称为“img”)。 We also flip the numpy array vertically, due to the manner in which PNG coordinates work.由于 PNG 坐标的工作方式,我们还垂直翻转了 numpy 数组。

import base64
img_rgba = np.flipud(np.stack((img,)*4, axis=-1)) # flip y-axis
img_rgba[:, :, -1] = 255 # set alpha channel (png uses byte-order)
data = write_png(bytearray(img_rgba), img_rgba.shape[1], img_rgba.shape[0])
data_enc = base64.b64encode(data)

Test that encoding works properly测试编码是否正常工作

Finally, to ensure the encoding works, we decode the base64 string and write the output to disk as "test_out.png".最后,为了确保编码正常工作,我们对 base64 字符串进行解码,并将输出作为“test_out.png”写入磁盘。 Check that this is the same image you started with.检查这是否与您开始使用的图像相同。

with open("test_out.png", "wb") as fb:
   fb.write(base64.decodestring(data_enc))

Alternative: Just use PIL替代方案:只需使用 PIL

However, I'm assuming that you are using some library to actually read your images in the first place?但是,我假设您首先使用某个库来实际读取图像? (Unless you are generating them). (除非您正在生成它们)。 Most libraries for reading images have support for this sort of thing.大多数用于读取图像的库都支持此类事情。 Supposing you are using PIL, you could also try the following snippet ( from this answer ).假设您正在使用 PIL,您还可以尝试以下代码片段( 来自此答案)。 It just saves the file in memory, rather than on disk, and uses this to generate a base64 string.它只是将文件保存在内存中,而不是磁盘上,并使用它来生成一个 base64 字符串。

in_mem_file = io.BytesIO()
img.save(in_mem_file, format = "PNG")
# reset file pointer to start
in_mem_file.seek(0)
img_bytes = in_mem_file.read()

base64_encoded_result_bytes = base64.b64encode(img_bytes)
base64_encoded_result_str = base64_encoded_result_bytes.decode('ascii')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM