简体   繁体   English

如何将Postgres bytea数据或Python memoryview对象转换为NumPy数组?

[英]How do I convert Postgres bytea data or a Python memoryview object into a NumPy array?

I have a PostgreSQL database (v 9.6) in which images are stored as bytea data. 我有一个PostgreSQL数据库(v 9.6),其中图像存储为bytea数据。 I do not know the image encoding. 我不知道图像编码。 (I did not set up this database, and it's not clear if I can change this setup, although I would like to, as storing large images in PostgreSQL database is not (IIUC) best practice.) (我没有设置这个数据库,虽然我愿意,但是我不确定是否可以更改此设置,因为在PostgreSQL数据库中存储大图像不是(IIUC)的最佳做法。)

I'd like to extract this data into an image, or better yet, directly into a NumPy array. 我想将这些数据提取到图像中,或者更好的是,直接提取到NumPy数组中。

Using SQLAlchemy, I can connect and extract the data: 使用SQLAlchemy,我可以连接并提取数据:

engine = create_engine(postgresql+psycopg2://user:password@server:port/database)
connection = engine.connect()
result = connection.execute('SELECT image FROM database.table LIMIT 1;')

The image in question is returned as a memoryview object; 有问题的图像作为memoryview对象返回; cast as a numpy array, it looks like this (per Cython: Convert memory view to NumPy array ): 强制转换numpy数组,如下所示(每个Cython:将内存视图转换为NumPy array ):

[b'\xaa' b'\x04' b'u' b'\x04' b'\x85' b'\x04' b'E' b'\x04' b'\x7f' b'\x04'
 b'\xa5' b'\x04' b'K' b'\x04' b'j' b'\x04' b'\x97' b'\x04' b';' b'\x04'
 b'w' b'\x04' b'k' b'\x04' b'E' b'\x04' b'b' b'\x04' b's' b'\x04']

I tried saving to jpg or tiff files (per Converting BLOB, stored on a database, to an image on an HTML website ), but was not able to open the resulting files with an image viewer. 我尝试保存为jpg或tiff文件(通过将存储在数据库中的BLOB转换为HTML网站上的图像 ),但无法使用图像查看器打开生成的文件。

I also tried this ( Open PIL image from byte file ), but get this result: 我也尝试过此操作( 从字节文件打开PIL图像 ),但得到以下结果:

OSError: cannot identify image file <_io.BytesIO object at 0x000002299F4DD830>

Or, from How to convert hex string to color image in python? 或者,从如何在python中将十六进制字符串转换为彩色图像? , I get this error: ,出现此错误:

ValueError: non-hexadecimal number found in fromhex() arg at position 0

So: How do I convert this bytea data or this memoryview object into a NumPy array? 所以:我该如何将此bytea数据或此memoryview对象转换为NumPy数组?

I may be missing something simple, or this may just be one reason why images should not be stored in SQL databases. 我可能缺少一些简单的东西,或者这可能只是不应将图像存储在SQL数据库中的原因之一。

For posterity, here's the simplest solution I arrived at. 对于后代,这是我得出的最简单的解决方案。

Best practice would be to NOT store the images in a database, but rather store multiple versions (different resolutions, from thumbnail (64x64 ish) to full res (2504x2504 in this case) in the file system, with filepaths to those images. Images can be sorted by a hash (some overhead) or by something like timestamp; the latter works for us because all the data is coming from one camera and will therefore have different timestamps. 最佳做法是不将图像存储在数据库中,而是在文件系统中存储多个版本(不同的分辨率,从缩略图(64x64 ish)到完整分辨率(在这种情况下为2504x2504),并带有这些图像的文件路径。可以按哈希(有些开销)或类似时间戳进行排序;后者对我们有用,因为所有数据都来自一台摄像机,因此具有不同的时间戳。

The data in question is a 16-bit grayscale TIFF file. 所讨论的数据是16位灰度TIFF文件。 Python Image Library (PIL) isn't able to translate those images. Python图像库(PIL)无法转换这些图像。 OpenCV can. OpenCV可以的。 However, since I want a NumPy array anyway, that doesn't really matter. 但是,由于无论如何我都想要一个NumPy数组,所以这并不重要。 MatPlotLib can display the arrays directly. MatPlotLib可以直接显示数组。 Numpy can slice or downsample as needed. Numpy可以根据需要切片或下采样。

engine = create_engine('postgresql+psycopg2://user:pass@server:port/database')
connection = engine.connect()

query = 'SELECT * FROM database.schema.table WHERE "ID" = 1234'
result = connection.execute(query)

for row in result:
    data = row[-1] # our image is the last column in the table

connection.close()

From here, numpy and matplotlib can do the lifting. 从这里, numpymatplotlib可以完成任务。 I know the resolution of the image, but that is also stored elsewhere in the database table. 我知道图像的分辨率,但是它也存储在数据库表的其他位置。

img_array = np.reshape(np.frombuffer(data, dtype="Int16"), (2504, 2504))

norm = cm.colors.Normalize(vmax=abs(img_array).max(), vmin=-abs(img_array).max())
plt.matshow(img_array, norm=norm, cmap="gray")
plt.show()

plt.imshow() also works. plt.imshow()也可以。

With OpenCV, the code we used was this: 使用OpenCV,我们使用的代码是这样的:

cv2.namedWindow("Image", cv2.WINDOW_NORMAL | cv2.WINDOW_KEEPRATIO)
cv2.imshow("Image", img_array)
cv2.waitKey(0)
cv2.destroyAllWindows()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM